distilbart mnli 12 1

valhalla

Introduction

DistilBART-MNLI is a distilled version of BART-Large-MNLI, created using the No Teacher Distillation technique. This method involves copying alternating layers from the BART-Large-MNLI model and further fine-tuning on the same dataset, producing a smaller, efficient model with minimal performance drop.

Architecture

DistilBART-MNLI is designed for zero-shot classification tasks. The distilled models, such as distilbart-mnli-12-1, offer a trade-off between model size and accuracy, with slight performance reductions compared to the baseline BART-Large-MNLI. These models maintain high accuracy, with metrics like matched and mismatched accuracy available for comparison across different versions.

Training

To train DistilBART-MNLI models, users can clone the distillbart-mnli repository and follow a series of steps, including:

  1. Clone and install the Transformers library from source.
    git clone https://github.com/huggingface/transformers.git
    pip install -qqq -U ./transformers
    
  2. Download the MNLI dataset.
    python transformers/utils/download_glue_data.py --data_dir glue_data --tasks MNLI
    
  3. Create a student model by specifying the number of encoder and decoder layers.
    python create_student.py \
      --teacher_model_name_or_path facebook/bart-large-mnli \
      --student_encoder_layers 12 \
      --student_decoder_layers 6 \
      --save_path student-bart-mnli-12-6
    
  4. Start fine-tuning with the provided script.
    python run_glue.py args.json
    

Logs of trained models can be accessed via the specified Weights & Biases project.

Guide: Running Locally

  1. Install Prerequisites: Ensure Python and necessary libraries (e.g., Transformers, PyTorch) are installed.
  2. Clone Repository: Clone the distillbart-mnli repository.
  3. Set Up Environment: Install dependencies using pip.
  4. Run Model: Use the model with the dataset downloaded previously.

For enhanced performance, consider using cloud GPU services such as AWS EC2 or Google Cloud Platform.

License

The model and associated files are made available under the terms specified in their respective repositories and Hugging Face's licensing guidelines.

More Related APIs in Zero Shot Classification