m De B E R Ta v3 base mnli xnli

MoritzLaurer

Introduction

The mDeBERTa-v3-base-mnli-xnli model is a multilingual natural language inference (NLI) and zero-shot classification model. Pre-trained on Microsoft's CC100 dataset and fine-tuned on XNLI and MNLI datasets, this model supports 16 languages. It is designed to perform well in multilingual contexts and is based on the DeBERTa architecture.

Architecture

The model is built on DeBERTa-v3, a state-of-the-art multilingual transformer model. It supports zero-shot classification by leveraging its understanding of multiple languages and fine-tuning on NLI datasets. The model achieves high performance by understanding complex sentence pairs across languages.

Training

The model was trained using professionally translated texts from the XNLI development dataset and the English MNLI train dataset. It avoids using machine-translated texts to prevent overfitting and reduces the risk of catastrophic forgetting of the languages it was pre-trained on. The training was conducted using specific hyperparameters such as a learning rate of 2e-05, a batch size of 16, and a weight decay of 0.06.

Guide: Running Locally

Basic Steps

  1. Install Transformers: Ensure you have the Transformers library installed.

    pip install transformers
    
  2. Load the Model:

    from transformers import pipeline
    classifier = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")
    
  3. Perform Classification: Use the model for zero-shot classification.

    sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
    candidate_labels = ["politics", "economy", "entertainment", "environment"]
    output = classifier(sequence_to_classify, candidate_labels, multi_label=False)
    print(output)
    

Suggest Cloud GPUs

For optimal performance, especially when dealing with large datasets or requiring faster inference, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.

License

This model is licensed under the MIT License, allowing for wide usage and distribution with minimal restrictions.

More Related APIs in Zero Shot Classification