m De B E R Ta v3 base mnli xnli
MoritzLaurerIntroduction
The mDeBERTa-v3-base-mnli-xnli
model is a multilingual natural language inference (NLI) and zero-shot classification model. Pre-trained on Microsoft's CC100 dataset and fine-tuned on XNLI and MNLI datasets, this model supports 16 languages. It is designed to perform well in multilingual contexts and is based on the DeBERTa architecture.
Architecture
The model is built on DeBERTa-v3, a state-of-the-art multilingual transformer model. It supports zero-shot classification by leveraging its understanding of multiple languages and fine-tuning on NLI datasets. The model achieves high performance by understanding complex sentence pairs across languages.
Training
The model was trained using professionally translated texts from the XNLI development dataset and the English MNLI train dataset. It avoids using machine-translated texts to prevent overfitting and reduces the risk of catastrophic forgetting of the languages it was pre-trained on. The training was conducted using specific hyperparameters such as a learning rate of 2e-05, a batch size of 16, and a weight decay of 0.06.
Guide: Running Locally
Basic Steps
-
Install Transformers: Ensure you have the Transformers library installed.
pip install transformers
-
Load the Model:
from transformers import pipeline classifier = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")
-
Perform Classification: Use the model for zero-shot classification.
sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU" candidate_labels = ["politics", "economy", "entertainment", "environment"] output = classifier(sequence_to_classify, candidate_labels, multi_label=False) print(output)
Suggest Cloud GPUs
For optimal performance, especially when dealing with large datasets or requiring faster inference, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.
License
This model is licensed under the MIT License, allowing for wide usage and distribution with minimal restrictions.