distilgpt2 tiny conversational

ethzanalytics

Introduction

distilgpt2-tiny-conversational is a fine-tuned version of the distilGPT2 model, designed for conversational applications. It leverages a persona framework for generating dialogues between two entities and is particularly suitable for chatbot functionalities.

Architecture

The model is a simplified, distilled version of GPT-2, optimized for dialogue generation. It generates conversations by using custom tokens to identify when responses begin and end, making it suitable for interactive applications.

Training

The model was fine-tuned using the Wizard of Wikipedia dataset, parsed from ParlAI. It was trained using DeepSpeed and Hugging Face Trainer, with specific hyperparameters including:

  • Learning Rate: 2e-05
  • Train Batch Size: 32
  • Eval Batch Size: 32
  • Seed: 42
  • Distributed Type: Multi-GPU
  • Gradient Accumulation Steps: 4
  • Total Train Batch Size: 128
  • Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
  • LR Scheduler: Cosine with warmup ratio 0.05
  • Epochs: 30

The training resulted in a gradual decrease in both training and validation loss over 30 epochs, achieving a final training loss of 2.2461.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Setup Environment:

    • Ensure you have Python installed.
    • Install necessary libraries: transformers, torch, and tokenizers.
  2. Clone Repository:

    • Clone the ai-msgbot repository:
      git clone https://github.com/pszemraj/ai-msgbot
      
  3. Install Dependencies:

    • Navigate to the cloned directory and install dependencies:
      cd ai-msgbot
      pip install -r requirements.txt
      
  4. Run the Model:

    • Use inference scripts provided in the repository to interact with the model.
  5. Optional - Use Cloud GPUs:

    • For efficient training and inference, consider using cloud GPUs from providers like AWS, GCP, or Azure.

License

The model is licensed under the Apache-2.0 License, allowing for wide usage and modification with proper attribution.

More Related APIs in Text Generation