deberta v2 xlarge mnli

microsoft

DeBERTa V2 XLarge MNLI

Introduction

DeBERTa V2 XLarge MNLI is a model developed by Microsoft that enhances BERT and RoBERTa models with disentangled attention and an enhanced mask decoder. This model has been fine-tuned for the Multi-Genre Natural Language Inference (MNLI) task and comprises 24 layers with a hidden size of 1536, totaling 900 million parameters. DeBERTa outperforms its predecessors in numerous natural language understanding (NLU) tasks.

Architecture

DeBERTa introduces a unique approach known as disentangled attention, alongside an enhanced mask decoder, to improve upon the BERT and RoBERTa architectures. These enhancements allow DeBERTa to achieve superior performance on a variety of NLU tasks, leveraging a training dataset of 80GB.

Training

The model has been fine-tuned on several NLU tasks, including SQuAD 1.1/2.0 and various GLUE benchmark tasks. The fine-tuning results demonstrate that DeBERTa V2 XLarge achieves competitive accuracy across multiple tasks, outperforming models like BERT-Large, RoBERTa-Large, and XLNet-Large.

Guide: Running Locally

To run DeBERTa V2 XLarge MNLI locally, you need to have the Hugging Face Transformers library installed. Follow these steps:

  1. Clone the Hugging Face Transformers repository and navigate to the examples/text-classification/ directory.

  2. Set up your environment variable for the task name, e.g., export TASK_NAME=mrpc.

  3. Use the following command to launch the training script using PyTorch's distributed training utility:

    python -m torch.distributed.launch --nproc_per_node=8 run_glue.py \
      --model_name_or_path microsoft/deberta-v2-xxlarge \
      --task_name $TASK_NAME \
      --do_train \
      --do_eval \
      --max_seq_length 128 \
      --per_device_train_batch_size 4 \
      --learning_rate 3e-6 \
      --num_train_epochs 3 \
      --output_dir /tmp/$TASK_NAME/ \
      --overwrite_output_dir \
      --sharded_ddp \
      --fp16
    
  4. For training efficiency, consider using cloud GPUs, such as those provided by AWS or Google Cloud.

License

The DeBERTa V2 XLarge MNLI model is released under the MIT License, allowing for broad usage and modification.

More Related APIs in Text Classification