distilbart mnli 12 9

valhalla

Introduction

DistilBART-MNLI is a distilled version of the BART-large-MNLI model, designed for zero-shot classification tasks. It employs a No Teacher Distillation technique to reduce the model size while maintaining performance.

Architecture

The model distillation process involves copying alternating layers from the original BART-large-MNLI model and further fine-tuning them on the MNLI dataset. This approach allows for a smaller model size with minimal performance degradation.

Performance Metrics:

Model Matched Accuracy Mismatched Accuracy
BART-large-MNLI 89.9 90.01
DistilBART-MNLI-12-1 87.08 87.5
DistilBART-MNLI-12-3 88.1 88.19
DistilBART-MNLI-12-6 89.19 89.01
DistilBART-MNLI-12-9 89.56 89.52

Training

To train DistilBART-MNLI yourself, follow these steps:

  1. Clone the Repository
    Clone the DistilBART-MNLI repository to access the necessary scripts and configurations:

    git clone https://github.com/patil-suraj/distillbart-mnli
    
  2. Install Transformers
    Install the Hugging Face Transformers library from source:

    git clone https://github.com/huggingface/transformers.git
    pip install -qqq -U ./transformers
    
  3. Download MNLI Data
    Use the provided script to download the MNLI dataset:

    python transformers/utils/download_glue_data.py --data_dir glue_data --tasks MNLI
    
  4. Create the Student Model
    Initialize the student model by specifying the layer configuration:

    python create_student.py \
      --teacher_model_name_or_path facebook/bart-large-mnli \
      --student_encoder_layers 12 \
      --student_decoder_layers 6 \
      --save_path student-bart-mnli-12-6
    
  5. Fine-Tune the Model
    Start the fine-tuning process using the configuration file:

    python run_glue.py args.json
    

Logs and further training details can be found on the WandB project page.

Guide: Running Locally

  1. Environment Setup
    Ensure you have Python and necessary libraries installed. Set up a virtual environment if needed.

  2. Install Requirements
    Install the Hugging Face Transformers library and other dependencies.

  3. Download the Model
    Fetch the model files using the Transformers library or from the Hugging Face model hub.

  4. Run Inference
    Use a script to load the model and perform inference on your data.

  5. Cloud GPUs
    Consider using cloud GPUs from providers like AWS, GCP, or Azure for faster training and inference.

License

The model and code are distributed under the Apache License 2.0. This allows for both commercial and non-commercial use, modification, and distribution, provided that proper attribution is given.

More Related APIs in Zero Shot Classification