MT0-SMALL Model Documentation Summary

Introduction

MT0-SMALL is a multilingual model designed for text-to-text generation tasks. It is part of the BLOOMZ and mT0 model family, which are capable of following human instructions in multiple languages without specific training for each task (zero-shot). The model is finetuned on a crosslingual task mixture, xP3, to enable crosslingual generalization.

Architecture

MT0-SMALL follows the architecture of mT5-small, designed for multilingual tasks. It was finetuned with 25,000 steps and used 4.62 billion tokens, employing bfloat16 precision for computational efficiency. The training was conducted on TPUv4-64 hardware using the T5X orchestration platform and Jax for neural network operations.

Training

The model was trained through multitask finetuning on the xP3 dataset, which allows it to generalize across unseen tasks and languages. The training setup leveraged high-performance TPU hardware and the T5X framework to manage the training process.

Guide: Running Locally

Basic Steps

  1. Install packages:

    pip install transformers accelerate
    
  2. Load the model and tokenizer:

    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    checkpoint = "bigscience/mt0-small"
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
    
  3. Prepare input and run inference:

    inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    

Cloud GPUs

For enhanced performance, particularly for larger models, using cloud-based GPUs such as those available on AWS, Google Cloud, or Azure is recommended. This setup can significantly speed up inference and training processes.

License

The MT0-SMALL model is released under the Apache 2.0 License, allowing for wide use and modification with proper attribution.

More Related APIs in Text2text Generation