deberta v3 xsmall
microsoftIntroduction
DeBERTa V3 is an advanced version of the DeBERTa model, which improves upon the original BERT and RoBERTa models by utilizing disentangled attention and an enhanced mask decoder. The V3 iteration introduces ELECTRA-style pre-training with Gradient Disentangled Embedding Sharing, significantly boosting performance on various natural language understanding (NLU) tasks.
Architecture
The DeBERTa V3 xsmall model consists of 12 layers, each with a hidden size of 384. It contains 22 million backbone parameters and an extensive vocabulary of 128,000 tokens, which contributes to 48 million parameters in the Embedding layer. The model was trained on 160GB of data, similar to DeBERTa V2.
Training
The DeBERTa V3 xsmall model demonstrates improved performance on tasks such as SQuAD 2.0 and MNLI. It achieves an F1 score of 84.8 and an Exact Match (EM) score of 82.0 on SQuAD 2.0, while obtaining accuracy scores of 88.1/88.3 on the MNLI-m/mm datasets. Fine-tuning can be performed using the Hugging Face Transformers library.
Guide: Running Locally
-
Setup Environment: Ensure you have Python and PyTorch installed. Use a virtual environment to manage dependencies.
-
Install Transformers Library:
pip install transformers datasets
-
Clone Repository and Navigate:
git clone https://github.com/huggingface/transformers.git cd transformers/examples/pytorch/text-classification/
-
Set Environment Variables:
export TASK_NAME=mnli output_dir="ds_results" num_gpus=8 batch_size=8
-
Run Training Script:
python -m torch.distributed.launch --nproc_per_node=${num_gpus} \ run_glue.py \ --model_name_or_path microsoft/deberta-v3-xsmall \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --evaluation_strategy steps \ --max_seq_length 256 \ --warmup_steps 1000 \ --per_device_train_batch_size ${batch_size} \ --learning_rate 4.5e-5 \ --num_train_epochs 3 \ --output_dir $output_dir \ --overwrite_output_dir \ --logging_steps 1000 \ --logging_dir $output_dir
Cloud GPUs such as those from AWS, GCP, or Azure are recommended for faster training times.
License
The DeBERTa V3 model is released under the MIT License, allowing for free use, modification, and distribution of the software.