JAIS-13B Model Summary

Introduction

JAIS-13B is a bilingual large language model (LLM) designed for Arabic and English text generation. It features 13 billion parameters and is developed by Inception, MBZUAI, and Cerebras Systems. The model is licensed under Apache 2.0 and aims to serve researchers, developers, and commercial users by providing an open-source solution for Arabic-centric applications.

Architecture

JAIS-13B is based on a transformer-based decoder-only architecture, similar to GPT-3. It includes 395 billion tokens in its training dataset, with 72 billion Arabic and 279 billion English/code tokens. The model uses SwiGLU non-linearity and ALiBi position embeddings to handle long sequences effectively.

Training

The model is trained on the Condor Galaxy 1 supercomputer and employs various publicly available datasets. The training data is a mix of Arabic and English sources, enhanced with machine-translated content. The training procedure includes specific hyperparameters such as an AdamW optimizer, a learning rate schedule, and a batch size of 1920 over 100,551 steps. JAIS-13B's performance is benchmarked against other models, showing strong results in Arabic language tasks.

Guide: Running Locally

To run JAIS-13B locally, follow these steps:

  1. Environment Setup: Ensure you have Python and PyTorch installed.
  2. Install Transformers: Use version 4.28.0 of the Transformers library.
  3. Load the Model: Use the provided sample code to load the model with trust_remote_code=True.
  4. Run Inference: Utilize the get_response function to generate text in Arabic or English.

For optimal performance, it is recommended to use cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

JAIS-13B is released under the Apache License 2.0, allowing free use, modification, and distribution. Users must comply with the license terms, available at https://www.apache.org/licenses/LICENSE-2.0.

More Related APIs in Text Generation