mt0 xxl
bigscienceIntroduction
The MT0-XXL model is part of a family of models developed under the BigScience initiative. These models are designed for text-to-text generation tasks and are capable of crosslingual generalization, following human instructions across numerous languages. The model is based on the mT5 architecture and has been fine-tuned on the xP3 dataset for enhanced multilingual capabilities.
Architecture
The architecture of MT0-XXL is similar to that of the mT5-XXL model, which is a transformer-based model designed for text generation tasks. The model has been fine-tuned using the xP3 dataset, which includes a mixture of 13 training tasks across 46 languages.
Training
- Model Architecture: MT0-XXL shares its architecture with mT5-XXL.
- Finetuning Steps: 7000 steps.
- Finetuning Tokens: 1.29 billion tokens.
- Precision: Bfloat16.
- Hardware: Trained using TPUv4-256.
- Software: Managed with T5X and implemented using JAX.
Guide: Running Locally
Basic Steps
-
Install Required Packages:
pip install -q transformers accelerate
-
Load Model and Tokenizer:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-xxl" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
-
Perform Inference:
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
Cloud GPUs
For optimal performance, it is recommended to use cloud GPUs such as those available on AWS, Google Cloud, or Azure. These services provide powerful GPU instances suitable for running large models like MT0-XXL.
License
The MT0-XXL model is released under the Apache 2.0 License, which allows for both commercial and non-commercial use, modification, and distribution.