distilgpt2 tiny conversational
ethzanalyticsIntroduction
distilgpt2-tiny-conversational
is a fine-tuned version of the distilGPT2 model, designed for conversational applications. It leverages a persona framework for generating dialogues between two entities and is particularly suitable for chatbot functionalities.
Architecture
The model is a simplified, distilled version of GPT-2, optimized for dialogue generation. It generates conversations by using custom tokens to identify when responses begin and end, making it suitable for interactive applications.
Training
The model was fine-tuned using the Wizard of Wikipedia dataset, parsed from ParlAI. It was trained using DeepSpeed and Hugging Face Trainer, with specific hyperparameters including:
- Learning Rate: 2e-05
- Train Batch Size: 32
- Eval Batch Size: 32
- Seed: 42
- Distributed Type: Multi-GPU
- Gradient Accumulation Steps: 4
- Total Train Batch Size: 128
- Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
- LR Scheduler: Cosine with warmup ratio 0.05
- Epochs: 30
The training resulted in a gradual decrease in both training and validation loss over 30 epochs, achieving a final training loss of 2.2461.
Guide: Running Locally
To run the model locally, follow these steps:
-
Setup Environment:
- Ensure you have Python installed.
- Install necessary libraries:
transformers
,torch
, andtokenizers
.
-
Clone Repository:
- Clone the
ai-msgbot
repository:git clone https://github.com/pszemraj/ai-msgbot
- Clone the
-
Install Dependencies:
- Navigate to the cloned directory and install dependencies:
cd ai-msgbot pip install -r requirements.txt
- Navigate to the cloned directory and install dependencies:
-
Run the Model:
- Use inference scripts provided in the repository to interact with the model.
-
Optional - Use Cloud GPUs:
- For efficient training and inference, consider using cloud GPUs from providers like AWS, GCP, or Azure.
License
The model is licensed under the Apache-2.0 License, allowing for wide usage and modification with proper attribution.