RoBERTa-Large-MNLI

Introduction

RoBERTa-Large-MNLI is an advanced transformer-based language model developed by Facebook AI. It is a fine-tuned version of the RoBERTa large model, optimized for the Multi-Genre Natural Language Inference (MNLI) task. This model is designed for zero-shot classification in the English language, leveraging a masked language modeling objective.

Architecture

The model is based on the transformer architecture and fine-tuned on the MNLI corpus. It uses a byte-level Byte-Pair Encoding (BPE) with a vocabulary size of 50,000. The input consists of 512 contiguous tokens, with dynamic masking during pretraining.

Training

RoBERTa-Large-MNLI was pretrained using a combination of datasets including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. Pretraining involved 1024 V100 GPUs over 500K steps, with a batch size of 8K and a sequence length of 512. The optimizer used was Adam, with specific parameter settings for learning rate, weight decay, and learning rate warmup.

Guide: Running Locally

To run the model locally, use the transformers library by Hugging Face:

from transformers import pipeline
classifier = pipeline('zero-shot-classification', model='roberta-large-mnli')

sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classifier(sequence_to_classify, candidate_labels)

To efficiently run this model, especially for large-scale tasks, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

RoBERTa-Large-MNLI is licensed under the MIT License, allowing for wide usage and modification.