aya 101
CohereForAIIntroduction
The Aya model is a massively multilingual generative language model that supports 101 languages. It outperforms models like mT0 and BLOOMZ in various evaluations, offering instruction-following capabilities in a wide range of languages. The model is released under the Apache-2.0 license to promote multilingual technology development.
Architecture
Aya is a Transformer-style autoregressive model with 13 billion parameters. It is designed to handle a broad spectrum of languages, surpassing other models in multilingual evaluations. The model architecture is similar to mt5-xxl, utilizing a batch size of 256 during finetuning on TPUv4-128 hardware.
Training
Aya was trained on several datasets, including xP3x, Aya Dataset, Aya Collection, and ShareGPT-Command. The training process involved filtering and pruning for the 101 supported languages. A total of 25 million samples were seen during finetuning, employing T5X and Jax software.
Guide: Running Locally
To run the Aya model locally:
-
Installation:
pip install -q transformers
-
Load Model:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "CohereForAI/aya-101" tokenizer = AutoTokenizer.from_pretrained(checkpoint) aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
-
Example Usage:
# Translation from Turkish to English tur_inputs = tokenizer.encode("Translate to English: Aya cok dilli bir dil modelidir.", return_tensors="pt") tur_outputs = aya_model.generate(tur_inputs, max_new_tokens=128) print(tokenizer.decode(tur_outputs[0]))
-
Cloud GPUs: For large-scale tasks, consider using cloud GPU providers like AWS, GCP, or Azure for enhanced performance.
License
Aya is released under the Apache-2.0 license. This open-access license allows for the free use and distribution of the model, fostering community engagement and research in multilingual AI technologies.