ar stablelm 2 base

stabilityai

Arabic Stable LM 2 1.6B Model Documentation

Introduction

Arabic Stable LM 2 1.6B is a fine-tuned language model designed for Arabic text generation. Developed by Stability AI, it builds upon the foundational Stable LM 2 1.6B and is trained on over 100 billion tokens of Arabic text. It is intended for research and non-commercial purposes.

Architecture

The Arabic Stable LM 2 1.6B model employs a transformer decoder architecture, similar to the LLaMA model. Key architectural features include:

  • Parameters and Structure: 1,644,417,024 parameters, 2048 hidden size, 24 layers, 32 heads, and a sequence length of 4096.
  • Position Embeddings: Rotary Position Embeddings are applied to enhance throughput.
  • Normalization: Uses LayerNorm with learned bias terms.
  • Biases: Bias terms are removed from most layers except the query, key, and value projections.
  • Tokenizer: Utilizes the Arcade100k, a BPE tokenizer.

Training

Training Dataset

The model is trained on both English (619 billion tokens) and Arabic (115 billion tokens) datasets. Fine-tuning is recommended for specific downstream tasks.

Training Procedure

The model underwent fine-tuning from the Stable LM 1.6B model with a learning scheduler. The fine-tuning involved:

  • 300k steps with a cosine and inverse square-root schedule.
  • 200k steps using a cooldown with a linear learning rate.

Training Infrastructure

  • Hardware: Utilized two nodes, each with 8 H100 GPUs, resulting in a global batch size of 96 sequences.
  • Software: Based on gpt-neox, employing 2D parallelism and flash-attention.

Guide: Running Locally

Basic Steps

  1. Installation:

    pip install transformers torch
    
  2. Code Example:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("stabilityai/ar-stablelm-2-base")
    model = AutoModelForCausalLM.from_pretrained("stabilityai/ar-stablelm-2-base", torch_dtype="auto")
    model.cuda()
    
    inputs = tokenizer("يعتبر عيد الأضحى", return_tensors="pt").to(model.device)
    tokens = model.generate(**inputs, max_new_tokens=64, temperature=0.70, top_p=0.95, do_sample=True)
    print(tokenizer.decode(tokens[0], skip_special_tokens=True))
    

Cloud GPUs Suggestion

For optimal performance, consider using cloud-based GPUs such as AWS EC2 with NVIDIA V100 or A100 instances.

License

The model is distributed under the Stability AI Non-Commercial Research Community License, which permits use for research and non-commercial purposes only. For commercial use, refer to Stability AI's licensing page.

More Related APIs in Text Generation