S I L M A 9 B Instruct v1.0

silma-ai

Introduction

SILMA-9B-Instruct-v1.0 is a leading Arabic language model developed by SILMA AI, designed to empower Arabic speakers with advanced AI capabilities. It is a 9 billion parameter model that excels in various text-generation tasks, often outperforming larger models.

Architecture

The SILMA model is built on the robust foundational models of Google Gemma, combining their strengths to deliver high performance. It supports both Arabic and English languages and is optimized for conversational applications.

Training

The model has been evaluated using several Arabic benchmarks, achieving notable scores in tasks involving text generation. It has been trained and validated with datasets such as MMLU, AlGhafa, ARC Challenge, and others, with performance measured in terms of normalized accuracy (acc_norm).

Guide: Running Locally

To run the SILMA model locally, follow these basic steps:

  1. Install Dependencies:

    pip install -U transformers sentencepiece
    pip install accelerate bitsandbytes
    
  2. Load and Run the Model Using PyTorch:

    • Import necessary libraries and load the model:
      from transformers import AutoTokenizer, AutoModelForCausalLM
      import torch
      
      model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
      tokenizer = AutoTokenizer.from_pretrained(model_id)
      model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
      model.to("cuda")
      
  3. Inference:

    • Use the model to generate text based on user input:
      messages = [{"role": "user", "content": "Write a message..."}]
      input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
      outputs = model.generate(**input_ids, max_new_tokens=256)
      print(tokenizer.decode(outputs[0]))
      

Suggested Cloud GPUs

  • Recommended: Nvidia A40, L40, RTX A6000 (48 GB)
  • Minimum: Nvidia RTX 4090, RTX 4000, L4 (16-24 GB)

These GPUs are suitable for running the model, especially in quantized modes (8-bit or 4-bit).

License

The model is open-weight and free to use under the Gemma license, which encourages sharing and innovation while adhering to responsible usage guidelines.

More Related APIs in Text Generation