mbart large 50
facebookIntroduction
mBART-50 is a multilingual Sequence-to-Sequence model introduced to demonstrate multilingual translation capabilities via fine-tuning. Pre-trained using the "Multilingual Denoising Pretraining" objective, mBART-50 supports translation tasks across 50 languages, extending the original mBART model by adding 25 more languages.
Architecture
The model employs a Sequence-to-Sequence architecture with a multilingual denoising pretraining objective. It concatenates data from multiple languages, applies noise through sentence shuffling and in-filling schemes, and reconstructs original texts. A unique language id symbol (LID) is used as a token to guide sentence prediction.
Training
mBART-50 requires sequences formatted with a language id token as a prefix. The text format should be [lang_code] X [eos]
, where lang_code
represents the source or target language code. The model can then be trained similarly to other sequence-to-sequence models. The example provided demonstrates setting up the tokenizer and model for a translation task from English to Romanian.
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50", src_lang="en_XX", tgt_lang="ro_RO")
src_text = " UN Chief Says There Is No Military Solution in Syria"
tgt_text = "Şeful ONU declară că nu există o soluţie militară în Siria"
model_inputs = tokenizer(src_text, return_tensors="pt")
with tokenizer.as_target_tokenizer():
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
model(**model_inputs, labels=labels) # forward pass
Guide: Running Locally
- Installation: Use the
transformers
library to load mBART-50. Ensure PyTorch or TensorFlow is installed on your system. - Setup: Initialize the model and tokenizer. Specify source and target languages with appropriate codes.
- Data Preparation: Format your data with language id tokens.
- Training: Fine-tune the model on your specific data. Use available examples for guidance.
- Execution: Run the model on a local machine. For better performance, consider using cloud GPU services such as AWS, GCP, or Azure.
License
mBART-50 is licensed under the MIT License, allowing for flexible use in both personal and commercial projects.