aya 101 LLM Model — Open LLM List

Introduction

The Aya model is a massively multilingual generative language model that supports 101 languages. It outperforms models like mT0 and BLOOMZ in various evaluations, offering instruction-following capabilities in a wide range of languages. The model is released under the Apache-2.0 license to promote multilingual technology development.

Architecture

Aya is a Transformer-style autoregressive model with 13 billion parameters. It is designed to handle a broad spectrum of languages, surpassing other models in multilingual evaluations. The model architecture is similar to mt5-xxl, utilizing a batch size of 256 during finetuning on TPUv4-128 hardware.

Training

Aya was trained on several datasets, including xP3x, Aya Dataset, Aya Collection, and ShareGPT-Command. The training process involved filtering and pruning for the 101 supported languages. A total of 25 million samples were seen during finetuning, employing T5X and Jax software.

Guide: Running Locally

To run the Aya model locally:

Installation:
```
pip install -q transformers
```

Load Model:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

checkpoint = "CohereForAI/aya-101"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

Example Usage:

# Translation from Turkish to English
tur_inputs = tokenizer.encode("Translate to English: Aya cok dilli bir dil modelidir.", return_tensors="pt")
tur_outputs = aya_model.generate(tur_inputs, max_new_tokens=128)
print(tokenizer.decode(tur_outputs[0]))

Cloud GPUs: For large-scale tasks, consider using cloud GPU providers like AWS, GCP, or Azure for enhanced performance.

License

Aya is released under the Apache-2.0 license. This open-access license allows for the free use and distribution of the model, fostering community engagement and research in multilingual AI technologies.

More Related APIs in Text2text Generation