mt5 xl
googleIntroduction
mT5 is a multilingual variant of the Text-to-Text Transfer Transformer (T5), designed to handle a wide range of natural language processing (NLP) tasks across 101 languages. It was pre-trained using the mC4 corpus, which is a Common Crawl-based dataset.
Architecture
mT5 utilizes a transformer-based architecture similar to T5, which is structured to handle text-to-text tasks in a unified format. This model is capable of processing multiple languages, making it suitable for multilingual applications.
Training
The mT5 model is pre-trained on the mC4 dataset, which includes data from 101 languages. Notably, mT5 was pre-trained without any supervised training, meaning it requires fine-tuning on specific downstream tasks to optimize its performance.
Guide: Running Locally
- Install Hugging Face Transformers:
pip install transformers
- Load the Model:
from transformers import MT5ForConditionalGeneration, MT5Tokenizer tokenizer = MT5Tokenizer.from_pretrained("google/mt5-xl") model = MT5ForConditionalGeneration.from_pretrained("google/mt5-xl")
- Prepare Input:
input_text = "translate English to French: Hello, how are you?" input_ids = tokenizer.encode(input_text, return_tensors="pt")
- Generate Output:
outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
For optimal performance, especially with large models like mT5-XL, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The mT5 model is distributed under the Apache 2.0 License, allowing for free use and modification with proper attribution.