Introduction

The Aya model is a massively multilingual generative language model that supports 101 languages. It outperforms models like mT0 and BLOOMZ in various evaluations, offering instruction-following capabilities in a wide range of languages. The model is released under the Apache-2.0 license to promote multilingual technology development.

Architecture

Aya is a Transformer-style autoregressive model with 13 billion parameters. It is designed to handle a broad spectrum of languages, surpassing other models in multilingual evaluations. The model architecture is similar to mt5-xxl, utilizing a batch size of 256 during finetuning on TPUv4-128 hardware.

Training

Aya was trained on several datasets, including xP3x, Aya Dataset, Aya Collection, and ShareGPT-Command. The training process involved filtering and pruning for the 101 supported languages. A total of 25 million samples were seen during finetuning, employing T5X and Jax software.

Guide: Running Locally

To run the Aya model locally:

  1. Installation:

    pip install -q transformers
    
  2. Load Model:

    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    checkpoint = "CohereForAI/aya-101"
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
    
  3. Example Usage:

    # Translation from Turkish to English
    tur_inputs = tokenizer.encode("Translate to English: Aya cok dilli bir dil modelidir.", return_tensors="pt")
    tur_outputs = aya_model.generate(tur_inputs, max_new_tokens=128)
    print(tokenizer.decode(tur_outputs[0]))
    
  4. Cloud GPUs: For large-scale tasks, consider using cloud GPU providers like AWS, GCP, or Azure for enhanced performance.

License

Aya is released under the Apache-2.0 license. This open-access license allows for the free use and distribution of the model, fostering community engagement and research in multilingual AI technologies.

More Related APIs in Text2text Generation