deberta v3 base LLM Model

Introduction

DeBERTaV3 is an improvement over previous DeBERTa models, utilizing ELECTRA-style pre-training with Gradient-Disentangled Embedding Sharing. This model enhances performance on various Natural Language Understanding (NLU) tasks, outperforming RoBERTa and other models. DeBERTaV3 is built with 12 layers and a hidden size of 768, containing 86 million backbone parameters and a vocabulary of 128,000 tokens.

Architecture

DeBERTaV3 incorporates 12 layers in its architecture with a hidden size of 768. It utilizes a vocabulary of 128K tokens, which introduces 98M parameters in the embedding layer. This model was trained on 160GB of data, similar to DeBERTa V2.

Training

DeBERTaV3 improves upon DeBERTa by using ELECTRA-style pre-training with Gradient-Disentangled Embedding Sharing, significantly enhancing performance on downstream NLU tasks. The model achieves superior results on tasks such as SQuAD 2.0 and MNLI, showing improvements in accuracy and F1 scores compared to previous models like RoBERTa and ELECTRA.

Guide: Running Locally

To fine-tune DeBERTaV3 on NLU tasks using the Hugging Face Transformers library:

Clone the Transformers repository and navigate to the examples/pytorch/text-classification/ directory.
Install the necessary datasets package.
Set the desired task name, e.g., export TASK_NAME=mnli.
Set the output directory and other training parameters like batch size and number of GPUs.

Execute the training script using the following command:

python -m torch.distributed.launch --nproc_per_node=${num_gpus} \
  run_glue.py \
  --model_name_or_path microsoft/deberta-v3-base \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --evaluation_strategy steps \
  --max_seq_length 256 \
  --warmup_steps 500 \
  --per_device_train_batch_size ${batch_size} \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir $output_dir \
  --overwrite_output_dir \
  --logging_steps 1000 \
  --logging_dir $output_dir

For optimal performance, consider using cloud GPUs available from providers like AWS, GCP, or Azure.

License

DeBERTaV3 is released under the MIT License, allowing for extensive use, modification, and distribution with proper attribution.

More Related APIs in Fill Mask