roberta large
FacebookAIIntroduction
The RoBERTa-large model, developed by Facebook AI, is a transformer-based model pretrained on English text using a masked language modeling (MLM) objective. It builds on BERT by training with more data and longer sequences, without the next sentence prediction objective. The model's primary use is in fine-tuning for tasks like sequence classification and question answering.
Architecture
RoBERTa-large utilizes a transformer architecture that processes input sequences of up to 512 tokens. The structure allows it to learn bidirectional representations by masking a random 15% of input words and predicting them, unlike autoregressive models that only predict future tokens.
Training
The RoBERTa-large model was trained on a diverse set of English-language datasets, including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. It was trained using 1024 V100 GPUs for 500,000 steps with a batch size of 8,000. The training employed an Adam optimizer with specific hyperparameters to maximize performance.
Guide: Running Locally
-
Install Transformers Library:
pip install transformers
-
Load the Model and Tokenizer:
from transformers import RobertaTokenizer, RobertaModel tokenizer = RobertaTokenizer.from_pretrained('roberta-large') model = RobertaModel.from_pretrained('roberta-large')
-
Tokenize and Predict:
text = "Your text here." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input)
-
Cloud GPU Suggestion:
For optimal performance, consider using cloud GPU services like AWS EC2 with NVIDIA V100 or A100 GPUs.
License
The RoBERTa-large model is released under the MIT License, allowing for wide use and modification with minimal restrictions.