madlad400 10b mt

google

MADLAD-400-10B-MT

Introduction

MADLAD-400-10B-MT is a multilingual machine translation model based on the T5 architecture. It has been trained on 250 billion tokens covering over 450 languages using publicly available data. The model is competitive with significantly larger models and is designed for machine translation and multilingual NLP tasks.

Architecture

The model utilizes a T5 architecture with variations in parameter sizes, including a 3B, 32-layer model, a 7.2B, 48-layer model, and a 10.7B, 32-layer model. It shares parameters across language pairs and employs a Sentence Piece Model with 256k tokens for both the encoder and decoder.

Training

MADLAD-400-10B-MT was trained using the MADLAD-400 dataset and parallel datasources covering 157 languages. The training procedure involved the use of web-crawled datasets, which were preprocessed extensively. The model prepends a <2xx> token to source sentences to indicate the target language. Further training details are available in the associated research paper.

Guide: Running Locally

  1. Environment Setup: Install necessary Python packages:

    pip install transformers accelerate sentencepiece
    
  2. Load and Run Model:

    from transformers import T5ForConditionalGeneration, T5Tokenizer
    
    model_name = 'google/madlad400-10b-mt'
    model = T5ForConditionalGeneration.from_pretrained(model_name, device_map="auto")
    tokenizer = T5Tokenizer.from_pretrained(model_name)
    
    text = "<2pt> I love pizza!"
    input_ids = tokenizer(text, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(input_ids=input_ids)
    
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    # Output: Eu adoro pizza!
    
  3. Hardware Recommendation: Running this model on cloud GPUs such as NVIDIA Tesla V100 or A100 is advisable due to its size and computational requirements.

License

The model is licensed under Apache 2.0, which allows for both personal and commercial use, distribution, and modification.

More Related APIs in Translation