Falcon3 10 B Instruct G G U F
QuantFactoryIntroduction
Falcon3-10B-Instruct-GGUF is a quantized version of the Falcon3-10B-Instruct model, designed for high performance in tasks like reasoning, language understanding, instruction following, and more. It supports English, French, Spanish, and Portuguese, with a context length of up to 32K. The model achieves notable benchmarks across various datasets and tasks.
Architecture
- Type: Transformer-based causal decoder-only architecture
- Components: 40 decoder blocks
- Attention Mechanism: Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads
- Head Dimension: 256
- Context Length: Up to 32K
- Vocabulary Size: 131K
- Special Features: High RoPE value, SwiGLu, and RMSNorm
Training
The Falcon3-10B-Instruct model is trained on a vast dataset, including 2 Teratokens of web, code, STEM, and multilingual data using 1024 H100 GPU chips. It is post-trained on 1.2 million samples of STEM, conversational, code, safety, and function call data. The model is developed by the Technology Innovation Institute.
Guide: Running Locally
- Install Transformers Library: Ensure you have the
transformers
library installed in your Python environment. - Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "tiiuae/Falcon3-10B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name)
- Prepare Input: Define your prompt and messages.
- Generate Output:
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=1024) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response)
- Hardware Recommendation: For optimal performance, consider using cloud GPUs like NVIDIA's A100 or H100.
License
The model is released under the TII Falcon-LLM License 2.0. For full terms and conditions, refer to the license link.