deberta v3 base
microsoftIntroduction
DeBERTaV3 is an improvement over previous DeBERTa models, utilizing ELECTRA-style pre-training with Gradient-Disentangled Embedding Sharing. This model enhances performance on various Natural Language Understanding (NLU) tasks, outperforming RoBERTa and other models. DeBERTaV3 is built with 12 layers and a hidden size of 768, containing 86 million backbone parameters and a vocabulary of 128,000 tokens.
Architecture
DeBERTaV3 incorporates 12 layers in its architecture with a hidden size of 768. It utilizes a vocabulary of 128K tokens, which introduces 98M parameters in the embedding layer. This model was trained on 160GB of data, similar to DeBERTa V2.
Training
DeBERTaV3 improves upon DeBERTa by using ELECTRA-style pre-training with Gradient-Disentangled Embedding Sharing, significantly enhancing performance on downstream NLU tasks. The model achieves superior results on tasks such as SQuAD 2.0 and MNLI, showing improvements in accuracy and F1 scores compared to previous models like RoBERTa and ELECTRA.
Guide: Running Locally
To fine-tune DeBERTaV3 on NLU tasks using the Hugging Face Transformers library:
- Clone the Transformers repository and navigate to the
examples/pytorch/text-classification/
directory. - Install the necessary
datasets
package. - Set the desired task name, e.g.,
export TASK_NAME=mnli
. - Set the output directory and other training parameters like batch size and number of GPUs.
- Execute the training script using the following command:
python -m torch.distributed.launch --nproc_per_node=${num_gpus} \ run_glue.py \ --model_name_or_path microsoft/deberta-v3-base \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --evaluation_strategy steps \ --max_seq_length 256 \ --warmup_steps 500 \ --per_device_train_batch_size ${batch_size} \ --learning_rate 2e-5 \ --num_train_epochs 3 \ --output_dir $output_dir \ --overwrite_output_dir \ --logging_steps 1000 \ --logging_dir $output_dir
For optimal performance, consider using cloud GPUs available from providers like AWS, GCP, or Azure.
License
DeBERTaV3 is released under the MIT License, allowing for extensive use, modification, and distribution with proper attribution.