Falcon3 10 B Instruct 1.58bit LLM Model

Introduction

The Falcon3-10B-Instruct-1.58bit model is a transformer-based, causal decoder-only model mainly for English text generation tasks, developed by the Technology Innovation Institute (TII). It employs a 1.58-bit precision for improved efficiency and is designed for instruct/chat applications.

Architecture

Falcon3-10B-Instruct-1.58bit is a pure-transformer model with a causal decoder-only architecture, optimized for 1.58-bit precision. This design enhances computational efficiency while maintaining performance.

Training

The model was trained using techniques from the 1-bit LLM strategy, as outlined in a Hugging Face blog post and the corresponding research paper. For comprehensive training details, refer to the Falcon-3 technical report, specifically the section on compression.

Guide: Running Locally

To run Falcon3-10B-Instruct-1.58bit locally, you can utilize either the Hugging Face Transformers library or Microsoft's BitNet.

Using Transformers

Install the Transformers library.

Load the model using the following Python code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/Falcon3-7B-Instruct-1.58bit"

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
).to("cuda")

Perform text generation tasks as needed.

Using BitNet

Clone the BitNet repository and install dependencies:

git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt

Set up the environment and run inference:

python setup_env.py --hf-repo tiiuae/Falcon3-10B-Instruct-1.58bit -q i2_s
python run_inference.py -m models/Falcon3-10B-1.58bit/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv

Cloud GPUs

For optimal performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The Falcon3-10B-Instruct-1.58bit model is licensed under the TII Falcon License 2.0. Full terms and conditions can be accessed here.

More Related APIs in Text Generation