De B E R Ta v3 large mnli fever anli ling wanli
MoritzLaurerIntroduction
The DeBERTa-v3-large-mnli-fever-anli-ling-wanli model is a state-of-the-art natural language inference (NLI) model designed for zero-shot classification tasks. It is fine-tuned on several NLI datasets, showcasing superior performance, especially on adversarial benchmarks.
Architecture
The model is built upon Microsoft's DeBERTa-v3-large, which introduces innovations over classical masked language models such as BERT and RoBERTa. It leverages recent advancements for improved performance in understanding and classifying text.
Training
Training Data
The model was trained on datasets such as MultiNLI, Fever-NLI, ANLI, LingNLI, and WANLI, totaling 885,242 hypothesis-premise pairs. The SNLI dataset was excluded due to quality issues.
Training Procedure
The model was trained using the Hugging Face trainer with specific hyperparameters:
- Epochs: 4
- Learning Rate: 5e-06
- Batch Size: 16 (train), 64 (eval)
- Gradient Accumulation Steps: 2
- Warmup Ratio: 0.06
- Weight Decay: 0.01
- Mixed Precision Training: Enabled (fp16)
Guide: Running Locally
Basic Steps
-
Install Transformers Library:
pip install transformers
-
Load the Model:
from transformers import pipeline classifier = pipeline("zero-shot-classification", model="MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli")
-
Classify a Sequence:
sequence_to_classify = "Angela Merkel is a politician in Germany and leader of the CDU" candidate_labels = ["politics", "economy", "entertainment", "environment"] output = classifier(sequence_to_classify, candidate_labels, multi_label=False) print(output)
Cloud GPUs
For enhanced performance, consider utilizing cloud GPU services such as AWS EC2 instances with NVIDIA GPUs or Google Cloud's AI Platform.
License
The model is available under the MIT License, allowing flexible usage and modification.