mt0 small
bigscienceMT0-SMALL Model Documentation Summary
Introduction
MT0-SMALL is a multilingual model designed for text-to-text generation tasks. It is part of the BLOOMZ and mT0 model family, which are capable of following human instructions in multiple languages without specific training for each task (zero-shot). The model is finetuned on a crosslingual task mixture, xP3, to enable crosslingual generalization.
Architecture
MT0-SMALL follows the architecture of mT5-small, designed for multilingual tasks. It was finetuned with 25,000 steps and used 4.62 billion tokens, employing bfloat16 precision for computational efficiency. The training was conducted on TPUv4-64 hardware using the T5X orchestration platform and Jax for neural network operations.
Training
The model was trained through multitask finetuning on the xP3 dataset, which allows it to generalize across unseen tasks and languages. The training setup leveraged high-performance TPU hardware and the T5X framework to manage the training process.
Guide: Running Locally
Basic Steps
-
Install packages:
pip install transformers accelerate
-
Load the model and tokenizer:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-small" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
-
Prepare input and run inference:
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
Cloud GPUs
For enhanced performance, particularly for larger models, using cloud-based GPUs such as those available on AWS, Google Cloud, or Azure is recommended. This setup can significantly speed up inference and training processes.
License
The MT0-SMALL model is released under the Apache 2.0 License, allowing for wide use and modification with proper attribution.