De B E R Ta v3 xsmall mnli fever anli ling binary
MoritzLaurerDeBERTa-v3-xsmall-mnli-fever-anli-ling-binary
Introduction
The DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary model is designed for zero-shot classification tasks, specifically binary natural language inference (NLI). It classifies whether a hypothesis entails or does not entail a premise. The model is based on Microsoft's DeBERTa-v3-xsmall architecture.
Architecture
The model leverages the DeBERTa-v3 architecture, which includes improvements over earlier versions, notably through a new pre-training objective. This enables the model to achieve better performance on NLI tasks.
Training
The model was trained on a total of 782,357 hypothesis-premise pairs sourced from four NLI datasets: MultiNLI, Fever-NLI, LingNLI, and ANLI. The training utilized the Hugging Face Trainer with specific hyperparameters, such as five epochs, a learning rate of 2e-05, and a batch size of 32. Mixed precision training was employed to enhance efficiency.
Evaluation results indicate strong performance on several datasets, with accuracy ranging from 66.5% to 92.5% across different test sets.
Guide: Running Locally
To use the model locally, follow these steps:
-
Install dependencies:
pip install transformers torch
-
Load and run the model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") model_name = "MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing." hypothesis = "The movie was good." input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt") output = model(input["input_ids"].to(device)) prediction = torch.softmax(output["logits"][0], -1).tolist() label_names = ["entailment", "not_entailment"] prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)} print(prediction)
-
Suggestion for cloud GPUs: Consider using GPUs such as the NVIDIA Tesla P100 available on cloud platforms like AWS or Google Cloud for faster inference times.
License
The model is licensed under the MIT License, allowing for broad use and modification.