Euro L L M 9 B Instruct LLM Model

Introduction

EuroLLM-9B-Instruct is a 9 billion parameter multilingual transformer language model. It is developed by a collaboration of various European universities and organizations, funded by the European Union. The model is designed to process text in 35 languages, with applications in text generation, machine translation, and general instruction-following tasks.

Architecture

EuroLLM employs a dense Transformer architecture with several enhancements to improve performance:

Grouped Query Attention (GQA): Utilizes 8 key-value heads for efficient inference.
Pre-layer Normalization with RMSNorm: Enhances training stability and speed.
SwiGLU Activation Function: Offers improved results on downstream tasks.
Rotary Positional Embeddings (RoPE): Ensures high performance and adaptable context length.

The model is structured with 42 layers, an embedding size of 4,096, and a feed-forward network (FFN) hidden size of 12,288. It has 32 attention heads and uses a sequence length of 4,096.

Training

EuroLLM-9B was trained on 4 trillion tokens using diverse data sources, including web data and high-quality datasets. The training utilized 400 Nvidia H100 GPUs, with a batch size of 2,800 sequences, equivalent to about 12 million tokens per iteration. The model uses the Adam optimizer with BF16 precision.

Guide: Running Locally

To run EuroLLM-9B-Instruct locally, follow these steps:

Install the transformers library from Hugging Face:
```
pip install transformers
```

Load the model and tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "utter-project/EuroLLM-9B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Prepare and process input messages:

messages = [
    {
        "role": "system",
        "content": "You are EuroLLM --- an AI assistant specialized in European languages that provides safe, educational and helpful answers.",
    },
    {
        "role": "user", "content": "What is the capital of Portugal? How would you describe it?"
    },
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")

Generate and decode the model's output:

outputs = model.generate(inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

For optimal performance, especially for large models like EuroLLM-9B, consider using cloud GPU services such as AWS, GCP, or Azure.

License

EuroLLM-9B-Instruct is licensed under the Apache License 2.0, allowing for wide usage and modification within the terms specified.

More Related APIs in Text Generation