roberta large mnli
FacebookAIRoBERTa-Large-MNLI
Introduction
RoBERTa-Large-MNLI is an advanced transformer-based language model developed by Facebook AI. It is a fine-tuned version of the RoBERTa large model, optimized for the Multi-Genre Natural Language Inference (MNLI) task. This model is designed for zero-shot classification in the English language, leveraging a masked language modeling objective.
Architecture
The model is based on the transformer architecture and fine-tuned on the MNLI corpus. It uses a byte-level Byte-Pair Encoding (BPE) with a vocabulary size of 50,000. The input consists of 512 contiguous tokens, with dynamic masking during pretraining.
Training
RoBERTa-Large-MNLI was pretrained using a combination of datasets including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. Pretraining involved 1024 V100 GPUs over 500K steps, with a batch size of 8K and a sequence length of 512. The optimizer used was Adam, with specific parameter settings for learning rate, weight decay, and learning rate warmup.
Guide: Running Locally
To run the model locally, use the transformers
library by Hugging Face:
from transformers import pipeline
classifier = pipeline('zero-shot-classification', model='roberta-large-mnli')
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classifier(sequence_to_classify, candidate_labels)
To efficiently run this model, especially for large-scale tasks, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
RoBERTa-Large-MNLI is licensed under the MIT License, allowing for wide usage and modification.