Arabic Stable LM 2 1.6B Model Documentation

Introduction

Arabic Stable LM 2 1.6B is a fine-tuned language model designed for Arabic text generation. Developed by Stability AI, it builds upon the foundational Stable LM 2 1.6B and is trained on over 100 billion tokens of Arabic text. It is intended for research and non-commercial purposes.

Architecture

The Arabic Stable LM 2 1.6B model employs a transformer decoder architecture, similar to the LLaMA model. Key architectural features include:

Parameters and Structure: 1,644,417,024 parameters, 2048 hidden size, 24 layers, 32 heads, and a sequence length of 4096.
Position Embeddings: Rotary Position Embeddings are applied to enhance throughput.
Normalization: Uses LayerNorm with learned bias terms.
Biases: Bias terms are removed from most layers except the query, key, and value projections.
Tokenizer: Utilizes the Arcade100k, a BPE tokenizer.

Training

Training Dataset

The model is trained on both English (619 billion tokens) and Arabic (115 billion tokens) datasets. Fine-tuning is recommended for specific downstream tasks.

Training Procedure

The model underwent fine-tuning from the Stable LM 1.6B model with a learning scheduler. The fine-tuning involved:

300k steps with a cosine and inverse square-root schedule.
200k steps using a cooldown with a linear learning rate.

Training Infrastructure

Hardware: Utilized two nodes, each with 8 H100 GPUs, resulting in a global batch size of 96 sequences.
Software: Based on gpt-neox, employing 2D parallelism and flash-attention.

Guide: Running Locally

Basic Steps

Installation:
```
pip install transformers torch
```

Code Example:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("stabilityai/ar-stablelm-2-base")
model = AutoModelForCausalLM.from_pretrained("stabilityai/ar-stablelm-2-base", torch_dtype="auto")
model.cuda()

inputs = tokenizer("يعتبر عيد الأضحى", return_tensors="pt").to(model.device)
tokens = model.generate(**inputs, max_new_tokens=64, temperature=0.70, top_p=0.95, do_sample=True)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Cloud GPUs Suggestion

For optimal performance, consider using cloud-based GPUs such as AWS EC2 with NVIDIA V100 or A100 instances.

License

The model is distributed under the Stability AI Non-Commercial Research Community License, which permits use for research and non-commercial purposes only. For commercial use, refer to Stability AI's licensing page.

More Related APIs in Text Generation

ar stablelm 2 base