Falcon3 10 B Instruct 1.58bit

tiiuae

Introduction

The Falcon3-10B-Instruct-1.58bit model is a transformer-based, causal decoder-only model mainly for English text generation tasks, developed by the Technology Innovation Institute (TII). It employs a 1.58-bit precision for improved efficiency and is designed for instruct/chat applications.

Architecture

Falcon3-10B-Instruct-1.58bit is a pure-transformer model with a causal decoder-only architecture, optimized for 1.58-bit precision. This design enhances computational efficiency while maintaining performance.

Training

The model was trained using techniques from the 1-bit LLM strategy, as outlined in a Hugging Face blog post and the corresponding research paper. For comprehensive training details, refer to the Falcon-3 technical report, specifically the section on compression.

Guide: Running Locally

To run Falcon3-10B-Instruct-1.58bit locally, you can utilize either the Hugging Face Transformers library or Microsoft's BitNet.

Using Transformers

  1. Install the Transformers library.
  2. Load the model using the following Python code:
    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_id = "tiiuae/Falcon3-7B-Instruct-1.58bit"
    
    model = AutoModelForCausalLM.from_pretrained(
      model_id,
      torch_dtype=torch.bfloat16,
    ).to("cuda")
    
  3. Perform text generation tasks as needed.

Using BitNet

  1. Clone the BitNet repository and install dependencies:
    git clone https://github.com/microsoft/BitNet && cd BitNet
    pip install -r requirements.txt
    
  2. Set up the environment and run inference:
    python setup_env.py --hf-repo tiiuae/Falcon3-10B-Instruct-1.58bit -q i2_s
    python run_inference.py -m models/Falcon3-10B-1.58bit/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
    

Cloud GPUs

For optimal performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The Falcon3-10B-Instruct-1.58bit model is licensed under the TII Falcon License 2.0. Full terms and conditions can be accessed here.

More Related APIs in Text Generation