Introduction

The MT0-XXL-MT model is a part of the BigScience initiative and belongs to the family of multitask finetuned models, designed to perform text-to-text generation tasks across multiple languages. The model is built on the mT5 architecture and is finetuned for crosslingual tasks, demonstrating capabilities in natural language generation and understanding across diverse languages and tasks.

Architecture

The MT0-XXL-MT model utilizes the same architecture as the mT5-XXL model. It has been finetuned using a crosslingual task mixture (xP3) to enhance its ability to generalize across different languages and tasks. The finetuning process involved using 1.29 billion tokens over 7,000 steps, leveraging TPUv4-256 hardware and the T5X orchestration software with Jax for neural networks.

Training

Finetuning was conducted with a precision of bfloat16 using the T5X framework and Jax. The training utilized TPUv4-256 hardware to process 1.29 billion tokens over 7,000 steps. The model was trained on a diverse set of languages, supported by the xP3 dataset, to ensure crosslingual capabilities.

Guide: Running Locally

  1. Installation:
    • Install the necessary libraries:
      pip install -q transformers accelerate bitsandbytes
      
  2. Load Model:
    • Use the following Python snippet to load and use the model:
      from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
      
      checkpoint = "bigscience/mt0-xxl-mt"
      
      tokenizer = AutoTokenizer.from_pretrained(checkpoint)
      model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
      
      inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
      outputs = model.generate(inputs)
      print(tokenizer.decode(outputs[0]))
      
  3. Hardware Suggestions:
    • For optimal performance, utilize cloud GPUs such as those provided by AWS, Google Cloud, or Azure, especially when processing large-scale data or running intensive tasks.

License

The MT0-XXL-MT model is distributed under the Apache 2.0 License, permitting free use, modification, and distribution, provided that the license terms are met.

More Related APIs in Text2text Generation