aya expanse 8b
CohereForAIIntroduction
Aya Expanse 8B is a multilingual large language model designed by Cohere For AI. It is part of an open-weight research release, emphasizing advanced multilingual capabilities. It is built upon Cohere’s extensive research in data management, multilingual preference training, safety tuning, and model merging. The model supports 23 languages and is available in both 8-billion and 32-billion parameter versions.
Architecture
Aya Expanse 8B employs an auto-regressive language model using an optimized transformer architecture. Post-training processes include supervised fine-tuning, preference training, and model merging. The model is optimized for multilingual text generation, supporting 23 languages with a context length of 8,000 tokens.
Training
The model was trained with a focus on multilingual preference training, data arbitrage, and safety tuning. It was evaluated against other models like Gemma 2 9B and Llama 3.1 8B using datasets such as the Aya Evaluation Suite and m-ArenaHard. These evaluations highlighted its strong multilingual capabilities.
Guide: Running Locally
- Install the Transformers Library: Ensure you have the
transformers
library installed.pip install transformers
- Load the Model:
from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "CohereForAI/aya-expanse-8b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) messages = [{"role": "user", "content": "Write a letter to my mom explaining how much I love her"}] input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") gen_tokens = model.generate(input_ids, max_new_tokens=100, do_sample=True, temperature=0.3) gen_text = tokenizer.decode(gen_tokens[0]) print(gen_text)
- Suggest Cloud GPUs: For efficient model inference, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.
License
Aya Expanse 8B is released under the Creative Commons BY-NC 4.0 license, which permits non-commercial use only. Users must also comply with C4AI's Acceptable Use Policy.