Falcon3 10 B Instruct G G U F
tiiuaeIntroduction
The Falcon3-10B-Instruct-GGUF is part of the Falcon3 family of Open Foundation Models, which are large language models (LLMs) ranging from 1 billion to 10 billion parameters. This model achieves state-of-the-art performance in reasoning, language understanding, instruction following, code, and mathematics tasks. It supports English, French, Spanish, and Portuguese, with a context length of up to 32,000 tokens.
Architecture
- Type: Transformer-based, causal decoder-only architecture.
- Structure: 40 decoder blocks with Grouped Query Attention (GQA) for faster inference, featuring 12 query heads and 4 key-value heads.
- Head Dimension: Wider at 256.
- RoPE Value: High at 1000042, supporting long context understanding.
- Components: Uses SwiGLu and RMSNorm.
- Context Length: 32,000.
- Vocabulary Size: 131,000.
- Scaling: Depth scaled from Falcon3-7B-Base using 2 Teratokens of diverse datasets.
- Posttraining: Done on 1.2 million samples covering STEM, conversational, code, safety, and function call data.
- Languages Supported: English (EN), French (FR), Spanish (ES), Portuguese (PT).
Training
The model is trained using 1024 H100 GPU chips with datasets including web, code, STEM, high-quality, and multilingual data. Posttraining involved an additional 1.2 million samples to enhance its instruction-following capabilities.
Guide: Running Locally
1. Download GGUF Models
Use the Hugging Face Hub to download the model:
pip install huggingface_hub
huggingface-cli download {model_name}
Replace {model_name}
with the relevant username and model name from Hugging Face.
2. Install llama.cpp
Options for installation include:
-
Build from Source:
git clone https://github.com/ggerganov/llama.cpp cd llama.cpp cmake -B build cmake --build build --config Release
Refer to the llama.cpp build documentation for details.
-
Pre-built Binaries: Check the llama.cpp repository for available binaries.
-
Docker: Use the official llama.cpp Docker image, detailed in the docker documentation.
3. Start Using the Model
- Text Completion:
llama-cli -m {path-to-gguf-model} -p "I believe the meaning of life is" -n 128
- Conversation Mode:
llama-cli -m {path-to-gguf-model} -p "You are a helpful assistant" -cnv -co
Suggested Cloud GPUs
To optimize performance, consider using cloud GPU services like AWS EC2 with NVIDIA Tesla V100 or A100 instances, Google Cloud Platform's NVIDIA GPUs, or Azure's GPU-accelerated virtual machines.
License
The model is released under the TII Falcon-LLM License 2.0.