distilgpt2 tiny conversational LLM Model

Introduction

distilgpt2-tiny-conversational is a fine-tuned version of the distilGPT2 model, designed for conversational applications. It leverages a persona framework for generating dialogues between two entities and is particularly suitable for chatbot functionalities.

Architecture

The model is a simplified, distilled version of GPT-2, optimized for dialogue generation. It generates conversations by using custom tokens to identify when responses begin and end, making it suitable for interactive applications.

Training

The model was fine-tuned using the Wizard of Wikipedia dataset, parsed from ParlAI. It was trained using DeepSpeed and Hugging Face Trainer, with specific hyperparameters including:

Learning Rate: 2e-05
Train Batch Size: 32
Eval Batch Size: 32
Seed: 42
Distributed Type: Multi-GPU
Gradient Accumulation Steps: 4
Total Train Batch Size: 128
Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
LR Scheduler: Cosine with warmup ratio 0.05
Epochs: 30

The training resulted in a gradual decrease in both training and validation loss over 30 epochs, achieving a final training loss of 2.2461.

Guide: Running Locally

To run the model locally, follow these steps:

Setup Environment:
- Ensure you have Python installed.
- Install necessary libraries: transformers, torch, and tokenizers.

Clone Repository:

Clone the ai-msgbot repository:

git clone https://github.com/pszemraj/ai-msgbot

Install Dependencies:
- Navigate to the cloned directory and install dependencies:
```
cd ai-msgbot
pip install -r requirements.txt
```
Run the Model:
- Use inference scripts provided in the repository to interact with the model.
Optional - Use Cloud GPUs:
- For efficient training and inference, consider using cloud GPUs from providers like AWS, GCP, or Azure.

License

The model is licensed under the Apache-2.0 License, allowing for wide usage and modification with proper attribution.

More Related APIs in Text Generation