m2m100_418 M
facebookIntroduction
M2M100 418M is a multilingual encoder-decoder model designed for many-to-many multilingual translation. Capable of translating directly between 9,900 language directions across 100 languages, it was introduced in a paper available on arXiv and its initial implementation can be found in the fairseq repository.
Architecture
The model employs a sequence-to-sequence architecture using the M2M100Tokenizer, which relies on the SentencePiece library. The model operates by forcing the target language ID as the first token in the generated sequence, utilizing the forced_bos_token_id
in its generate method.
Training
Training details for M2M100 are outlined in the associated paper. The model has been trained to handle multilingual translation across a comprehensive set of languages, ensuring robust performance across diverse language pairs without the need for intermediary translations via English.
Guide: Running Locally
To run the model locally, follow these steps:
- Installation: Ensure that the SentencePiece library is installed via
pip install sentencepiece
. - Model and Tokenizer Setup:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M") tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
- Translation Example:
- Translate Hindi to French:
tokenizer.src_lang = "hi" encoded_hi = tokenizer("जीवन एक चॉकलेट बॉक्स की तरह है।", return_tensors="pt") generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("fr")) print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True))
- Translate Chinese to English:
tokenizer.src_lang = "zh" encoded_zh = tokenizer("生活就像一盒巧克力。", return_tensors="pt") generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en")) print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True))
- Translate Hindi to French:
For enhanced performance, consider using cloud GPUs from providers like AWS, GCP, or Azure.
License
The M2M100 418M model is licensed under the MIT License, allowing for wide use and modification.