deberta v3 large
microsoftIntroduction
DeBERTa V3, developed by Microsoft, is an advanced version of the DeBERTa model that uses ELECTRA-style pre-training with gradient-disentangled embedding sharing. This model improves upon previous versions such as BERT, RoBERTa, and DeBERTa, particularly in efficiency and performance on natural language understanding (NLU) tasks.
Architecture
The DeBERTa V3 large model features:
- 24 layers
- A hidden size of 1024
- 304 million backbone parameters
- A vocabulary containing 128,000 tokens, contributing to 131 million parameters in the embedding layer
Training
DeBERTa V3 is trained on 160GB of data, similar to DeBERTa V2, using advanced techniques that enhance its performance on downstream tasks. Fine-tuning results show improved scores on datasets like SQuAD 2.0 and MNLI, outperforming earlier models such as RoBERTa-large and XLNet-large.
Guide: Running Locally
Basic Steps
-
Clone the Transformers Repository:
git clone https://github.com/huggingface/transformers.git cd transformers/examples/pytorch/text-classification/
-
Install Required Packages:
pip install datasets
-
Set Environment Variables:
export TASK_NAME=mnli
-
Run Fine-tuning Script:
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py \ --model_name_or_path microsoft/deberta-v3-large \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --evaluation_strategy steps \ --max_seq_length 256 \ --warmup_steps 50 \ --per_device_train_batch_size 8 \ --learning_rate 6e-6 \ --num_train_epochs 2 \ --output_dir ds_results \ --overwrite_output_dir \ --logging_steps 1000 \ --logging_dir ds_results
Cloud GPUs
For enhanced performance, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The DeBERTa V3 model is licensed under the MIT License, allowing for broad use, modification, and distribution.