deberta v3 large tasksource nli
sileodIntroduction
The DeBERTa-v3-large model, fine-tuned with multi-task learning on the tasksource collection, is optimized for zero-shot classification and natural language inference (NLI). It demonstrates strong validation performance on tasks such as WNLI and MNLI, leveraging a shared encoder trained across numerous datasets.
Architecture
The model is built on the DeBERTa-v3 architecture with a multi-task learning approach. It employs a shared encoder with task-specific CLS embeddings. Classification tasks share weights if their labels match, with multiple-choice models using the same classification layers. The model handles up to 64k examples per task, trained over 80k steps with a batch size of 384.
Training
Training utilized a Nvidia A100 40GB GPU over six days, achieving a peak learning rate of 2e-5. The model's architecture supports strong linear probing performance, owing to its extensive training on diverse datasets and tasks.
Guide: Running Locally
- Setup Environment: Ensure PyTorch and Hugging Face Transformers are installed.
- Download Model: Use the Hugging Face Model Hub to download
deberta-v3-large-tasksource-nli
. - Load Model: Use
AutoModelForSequenceClassification
from Transformers to load the model. - Inference: Perform zero-shot classification leveraging the model's pre-trained capabilities.
For optimal performance, it is recommended to use cloud GPUs such as Nvidia A100 available on platforms like AWS, GCP, or Azure.
License
The model is distributed under the Apache-2.0 License, allowing use, modification, and distribution under the same license.