Falcon3 7 B Instruct G G U F
tiiuaeIntroduction
The Falcon3-7B-Instruct-GGUF is part of the Falcon3 family of Open Foundation Models, offering state-of-the-art performance in reasoning, language understanding, and other tasks. It supports four languages: English, French, Spanish, and Portuguese, allowing a context length of up to 32K. This model is designed for text generation and is fine-tuned with GGUF instructions.
Architecture
- Type: Transformer-based causal decoder-only architecture
- Structure: 28 decoder blocks
- Attention: Grouped Query Attention with 12 query heads and 4 key-value heads for faster inference
- Head Dimension: 256 (wider head dimension)
- RoPE Value: 1000042, allowing long context understanding
- Techniques: Uses SwiGLU and RMSNorm
- Context Length: 32K
- Vocabulary Size: 131K
- Pretraining: Leveraged 14 Teratokens of diverse datasets using 1024 H100 GPU chips
- Posttraining: Trained on 1.2 million samples across various domains
Training
Falcon3-7B-Instruct was pretrained on extensive datasets comprising web, code, STEM, and multilingual data and posttrained on focused datasets including STEM and conversational data. This comprehensive training approach enables the model to excel in various complex tasks.
Guide: Running Locally
-
Download the Model:
- Use the
huggingface_hub
library to download the model:pip install huggingface_hub huggingface-cli download {model_name}
- Replace
{model_name}
with the actual model name from Hugging Face.
- Use the
-
Install llama.cpp:
- Build from Source:
git clone https://github.com/ggerganov/llama.cpp cd llama.cpp cmake -B build cmake --build build --config Release
- Download Pre-built Binaries: Available in the llama.cpp repository.
- Use Docker: Follow the llama.cpp Docker documentation for setup.
- Build from Source:
-
Run the Model:
- For text completion:
llama-cli -m {path-to-gguf-model} -p "I believe the meaning of life is" -n 128
- For conversation mode:
llama-cli -m {path-to-gguf-model} -p "You are a helpful assistant" -cnv -co
- For text completion:
Cloud GPUs: Consider using cloud GPU services for more efficient model inference, especially for large models like Falcon3-7B-Instruct.
License
The Falcon3-7B-Instruct-GGUF model is released under the TII Falcon-LLM License 2.0, developed by the Technology Innovation Institute.