distilbart mnli 12 9
valhallaIntroduction
DistilBART-MNLI is a distilled version of the BART-large-MNLI model, designed for zero-shot classification tasks. It employs a No Teacher Distillation technique to reduce the model size while maintaining performance.
Architecture
The model distillation process involves copying alternating layers from the original BART-large-MNLI model and further fine-tuning them on the MNLI dataset. This approach allows for a smaller model size with minimal performance degradation.
Performance Metrics:
Model | Matched Accuracy | Mismatched Accuracy |
---|---|---|
BART-large-MNLI | 89.9 | 90.01 |
DistilBART-MNLI-12-1 | 87.08 | 87.5 |
DistilBART-MNLI-12-3 | 88.1 | 88.19 |
DistilBART-MNLI-12-6 | 89.19 | 89.01 |
DistilBART-MNLI-12-9 | 89.56 | 89.52 |
Training
To train DistilBART-MNLI yourself, follow these steps:
-
Clone the Repository
Clone the DistilBART-MNLI repository to access the necessary scripts and configurations:git clone https://github.com/patil-suraj/distillbart-mnli
-
Install Transformers
Install the Hugging Face Transformers library from source:git clone https://github.com/huggingface/transformers.git pip install -qqq -U ./transformers
-
Download MNLI Data
Use the provided script to download the MNLI dataset:python transformers/utils/download_glue_data.py --data_dir glue_data --tasks MNLI
-
Create the Student Model
Initialize the student model by specifying the layer configuration:python create_student.py \ --teacher_model_name_or_path facebook/bart-large-mnli \ --student_encoder_layers 12 \ --student_decoder_layers 6 \ --save_path student-bart-mnli-12-6
-
Fine-Tune the Model
Start the fine-tuning process using the configuration file:python run_glue.py args.json
Logs and further training details can be found on the WandB project page.
Guide: Running Locally
-
Environment Setup
Ensure you have Python and necessary libraries installed. Set up a virtual environment if needed. -
Install Requirements
Install the Hugging Face Transformers library and other dependencies. -
Download the Model
Fetch the model files using the Transformers library or from the Hugging Face model hub. -
Run Inference
Use a script to load the model and perform inference on your data. -
Cloud GPUs
Consider using cloud GPUs from providers like AWS, GCP, or Azure for faster training and inference.
License
The model and code are distributed under the Apache License 2.0. This allows for both commercial and non-commercial use, modification, and distribution, provided that proper attribution is given.