deberta v3 large tasksource nli LLM Model

Introduction

The DeBERTa-v3-large model, fine-tuned with multi-task learning on the tasksource collection, is optimized for zero-shot classification and natural language inference (NLI). It demonstrates strong validation performance on tasks such as WNLI and MNLI, leveraging a shared encoder trained across numerous datasets.

Architecture

The model is built on the DeBERTa-v3 architecture with a multi-task learning approach. It employs a shared encoder with task-specific CLS embeddings. Classification tasks share weights if their labels match, with multiple-choice models using the same classification layers. The model handles up to 64k examples per task, trained over 80k steps with a batch size of 384.

Training

Training utilized a Nvidia A100 40GB GPU over six days, achieving a peak learning rate of 2e-5. The model's architecture supports strong linear probing performance, owing to its extensive training on diverse datasets and tasks.

Guide: Running Locally

Setup Environment: Ensure PyTorch and Hugging Face Transformers are installed.
Download Model: Use the Hugging Face Model Hub to download deberta-v3-large-tasksource-nli.
Load Model: Use AutoModelForSequenceClassification from Transformers to load the model.
Inference: Perform zero-shot classification leveraging the model's pre-trained capabilities.

For optimal performance, it is recommended to use cloud GPUs such as Nvidia A100 available on platforms like AWS, GCP, or Azure.

License

The model is distributed under the Apache-2.0 License, allowing use, modification, and distribution under the same license.

More Related APIs in Zero Shot Classification