Introduction

The MT0-LARGE model, part of the BLOOMZ and mT0 family, is designed for text-to-text generation and supports multiple languages. It is a finetuned version of the mT5 multilingual language model, capable of handling tasks across languages without explicit task-specific training data. This model is suitable for various natural language processing tasks such as translation, sentiment analysis, and query generation.

Architecture

The architecture of MT0-LARGE mirrors that of the mT5-large model, optimized for multilingual and multitask applications. It has been finetuned using a crosslingual task mixture (xP3) to enhance its ability to generalize across languages and tasks. The model operates with 4.62 billion finetuning tokens and uses bfloat16 precision. Training was performed on TPUv4-64 hardware using T5X for orchestration and Jax for neural networks.

Training

The MT0-LARGE model underwent finetuning over 25,000 steps. It was trained to improve crosslingual generalization, leveraging the xP3 dataset, which consists of various multilingual tasks. The training process involved using a significant amount of hardware resources, specifically TPUv4-64, and employed Jax and T5X as the primary software frameworks.

Guide: Running Locally

To run the MT0-LARGE model locally, follow these steps:

  1. Install the Required Libraries:

    pip install transformers accelerate
    
  2. Load the Model and Tokenizer:

    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    checkpoint = "bigscience/mt0-large"
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
    
  3. Run Inference on a CPU:

    inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    
  4. Run Inference on a GPU:

    model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
    inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    
  5. Cloud GPUs: For improved performance, consider using cloud-based GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The MT0-LARGE model is released under the Apache-2.0 license, which allows for permissive use, distribution, and modification of the model and its components.

More Related APIs in Text2text Generation