Falcon3 10 B Instruct 1.58bit
tiiuaeIntroduction
The Falcon3-10B-Instruct-1.58bit model is a transformer-based, causal decoder-only model mainly for English text generation tasks, developed by the Technology Innovation Institute (TII). It employs a 1.58-bit precision for improved efficiency and is designed for instruct/chat applications.
Architecture
Falcon3-10B-Instruct-1.58bit is a pure-transformer model with a causal decoder-only architecture, optimized for 1.58-bit precision. This design enhances computational efficiency while maintaining performance.
Training
The model was trained using techniques from the 1-bit LLM strategy, as outlined in a Hugging Face blog post and the corresponding research paper. For comprehensive training details, refer to the Falcon-3 technical report, specifically the section on compression.
Guide: Running Locally
To run Falcon3-10B-Instruct-1.58bit locally, you can utilize either the Hugging Face Transformers library or Microsoft's BitNet.
Using Transformers
- Install the Transformers library.
- Load the model using the following Python code:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "tiiuae/Falcon3-7B-Instruct-1.58bit" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, ).to("cuda")
- Perform text generation tasks as needed.
Using BitNet
- Clone the BitNet repository and install dependencies:
git clone https://github.com/microsoft/BitNet && cd BitNet pip install -r requirements.txt
- Set up the environment and run inference:
python setup_env.py --hf-repo tiiuae/Falcon3-10B-Instruct-1.58bit -q i2_s python run_inference.py -m models/Falcon3-10B-1.58bit/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
Cloud GPUs
For optimal performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The Falcon3-10B-Instruct-1.58bit model is licensed under the TII Falcon License 2.0. Full terms and conditions can be accessed here.