Introduction

The MT0-XXL model is part of a family of models developed under the BigScience initiative. These models are designed for text-to-text generation tasks and are capable of crosslingual generalization, following human instructions across numerous languages. The model is based on the mT5 architecture and has been fine-tuned on the xP3 dataset for enhanced multilingual capabilities.

Architecture

The architecture of MT0-XXL is similar to that of the mT5-XXL model, which is a transformer-based model designed for text generation tasks. The model has been fine-tuned using the xP3 dataset, which includes a mixture of 13 training tasks across 46 languages.

Training

  • Model Architecture: MT0-XXL shares its architecture with mT5-XXL.
  • Finetuning Steps: 7000 steps.
  • Finetuning Tokens: 1.29 billion tokens.
  • Precision: Bfloat16.
  • Hardware: Trained using TPUv4-256.
  • Software: Managed with T5X and implemented using JAX.

Guide: Running Locally

Basic Steps

  1. Install Required Packages:

    pip install -q transformers accelerate
    
  2. Load Model and Tokenizer:

    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    checkpoint = "bigscience/mt0-xxl"
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
    
  3. Perform Inference:

    inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    

Cloud GPUs

For optimal performance, it is recommended to use cloud GPUs such as those available on AWS, Google Cloud, or Azure. These services provide powerful GPU instances suitable for running large models like MT0-XXL.

License

The MT0-XXL model is released under the Apache 2.0 License, which allows for both commercial and non-commercial use, modification, and distribution.

More Related APIs in Text2text Generation