deberta v2 xlarge mnli
microsoftDeBERTa V2 XLarge MNLI
Introduction
DeBERTa V2 XLarge MNLI is a model developed by Microsoft that enhances BERT and RoBERTa models with disentangled attention and an enhanced mask decoder. This model has been fine-tuned for the Multi-Genre Natural Language Inference (MNLI) task and comprises 24 layers with a hidden size of 1536, totaling 900 million parameters. DeBERTa outperforms its predecessors in numerous natural language understanding (NLU) tasks.
Architecture
DeBERTa introduces a unique approach known as disentangled attention, alongside an enhanced mask decoder, to improve upon the BERT and RoBERTa architectures. These enhancements allow DeBERTa to achieve superior performance on a variety of NLU tasks, leveraging a training dataset of 80GB.
Training
The model has been fine-tuned on several NLU tasks, including SQuAD 1.1/2.0 and various GLUE benchmark tasks. The fine-tuning results demonstrate that DeBERTa V2 XLarge achieves competitive accuracy across multiple tasks, outperforming models like BERT-Large, RoBERTa-Large, and XLNet-Large.
Guide: Running Locally
To run DeBERTa V2 XLarge MNLI locally, you need to have the Hugging Face Transformers library installed. Follow these steps:
-
Clone the Hugging Face Transformers repository and navigate to the
examples/text-classification/
directory. -
Set up your environment variable for the task name, e.g.,
export TASK_NAME=mrpc
. -
Use the following command to launch the training script using PyTorch's distributed training utility:
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py \ --model_name_or_path microsoft/deberta-v2-xxlarge \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --max_seq_length 128 \ --per_device_train_batch_size 4 \ --learning_rate 3e-6 \ --num_train_epochs 3 \ --output_dir /tmp/$TASK_NAME/ \ --overwrite_output_dir \ --sharded_ddp \ --fp16
-
For training efficiency, consider using cloud GPUs, such as those provided by AWS or Google Cloud.
License
The DeBERTa V2 XLarge MNLI model is released under the MIT License, allowing for broad usage and modification.