Euro L L M 1.7 B
utter-projectIntroduction
EuroLLM-1.7B is a multilingual language model developed as part of the EuroLLM project. This initiative aims to create models capable of understanding and generating text in all European Union languages and additional relevant languages. The model has 1.7 billion parameters and has been trained on a diverse dataset of 4 trillion tokens.
Architecture
The EuroLLM-1.7B employs a standard dense Transformer architecture with several enhancements for performance:
- Grouped Query Attention (GQA) with 8 key-value heads for improved inference speed.
- Pre-layer normalization and RMSNorm for training stability and efficiency.
- SwiGLU activation function for improved downstream task results.
- Rotary Positional Embeddings (RoPE) for better performance and extended context length.
Training
Training was conducted using 256 Nvidia H100 GPUs on the Marenostrum 5 supercomputer. The model was trained with a batch size of 3,072 sequences (~12 million tokens), utilizing the Adam optimizer and BF16 precision. Key hyperparameters include:
- Sequence Length: 4,096
- Number of Layers: 24
- Embedding Size: 2,048
- FFN Hidden Size: 5,632
- Number of Heads: 16
- Activation Function: SwiGLU
- Position Encodings: RoPE
Guide: Running Locally
To run EuroLLM-1.7B locally, follow these steps:
-
Install the Transformers library:
pip install transformers
-
Load the model and tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "utter-project/EuroLLM-1.7B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id)
-
Generate text:
text = "English: My name is EuroLLM. Portuguese:" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=20) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Cloud GPU Suggestion: For optimal performance, consider using cloud services offering NVIDIA GPUs, such as AWS EC2, Google Cloud Platform, or Azure.
License
EuroLLM-1.7B is released under the Apache License 2.0, allowing for flexible use and distribution.