ar stablelm 2 base
stabilityaiArabic Stable LM 2 1.6B Model Documentation
Introduction
Arabic Stable LM 2 1.6B is a fine-tuned language model designed for Arabic text generation. Developed by Stability AI, it builds upon the foundational Stable LM 2 1.6B and is trained on over 100 billion tokens of Arabic text. It is intended for research and non-commercial purposes.
Architecture
The Arabic Stable LM 2 1.6B model employs a transformer decoder architecture, similar to the LLaMA model. Key architectural features include:
- Parameters and Structure: 1,644,417,024 parameters, 2048 hidden size, 24 layers, 32 heads, and a sequence length of 4096.
- Position Embeddings: Rotary Position Embeddings are applied to enhance throughput.
- Normalization: Uses LayerNorm with learned bias terms.
- Biases: Bias terms are removed from most layers except the query, key, and value projections.
- Tokenizer: Utilizes the Arcade100k, a BPE tokenizer.
Training
Training Dataset
The model is trained on both English (619 billion tokens) and Arabic (115 billion tokens) datasets. Fine-tuning is recommended for specific downstream tasks.
Training Procedure
The model underwent fine-tuning from the Stable LM 1.6B model with a learning scheduler. The fine-tuning involved:
- 300k steps with a cosine and inverse square-root schedule.
- 200k steps using a cooldown with a linear learning rate.
Training Infrastructure
- Hardware: Utilized two nodes, each with 8 H100 GPUs, resulting in a global batch size of 96 sequences.
- Software: Based on gpt-neox, employing 2D parallelism and flash-attention.
Guide: Running Locally
Basic Steps
-
Installation:
pip install transformers torch
-
Code Example:
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("stabilityai/ar-stablelm-2-base") model = AutoModelForCausalLM.from_pretrained("stabilityai/ar-stablelm-2-base", torch_dtype="auto") model.cuda() inputs = tokenizer("يعتبر عيد الأضحى", return_tensors="pt").to(model.device) tokens = model.generate(**inputs, max_new_tokens=64, temperature=0.70, top_p=0.95, do_sample=True) print(tokenizer.decode(tokens[0], skip_special_tokens=True))
Cloud GPUs Suggestion
For optimal performance, consider using cloud-based GPUs such as AWS EC2 with NVIDIA V100 or A100 instances.
License
The model is distributed under the Stability AI Non-Commercial Research Community License, which permits use for research and non-commercial purposes only. For commercial use, refer to Stability AI's licensing page.