Falcon3 10 B Instruct G G U F LLM Model

Introduction

Falcon3-10B-Instruct-GGUF is a quantized version of the Falcon3-10B-Instruct model, designed for high performance in tasks like reasoning, language understanding, instruction following, and more. It supports English, French, Spanish, and Portuguese, with a context length of up to 32K. The model achieves notable benchmarks across various datasets and tasks.

Architecture

Type: Transformer-based causal decoder-only architecture
Components: 40 decoder blocks
Attention Mechanism: Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads
Head Dimension: 256
Context Length: Up to 32K
Vocabulary Size: 131K
Special Features: High RoPE value, SwiGLu, and RMSNorm

Training

The Falcon3-10B-Instruct model is trained on a vast dataset, including 2 Teratokens of web, code, STEM, and multilingual data using 1024 H100 GPU chips. It is post-trained on 1.2 million samples of STEM, conversational, code, safety, and function call data. The model is developed by the Technology Innovation Institute.

Guide: Running Locally

Install Transformers Library: Ensure you have the transformers library installed in your Python environment.

Load Model and Tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tiiuae/Falcon3-10B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

Prepare Input: Define your prompt and messages.

Generate Output:

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=1024)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Hardware Recommendation: For optimal performance, consider using cloud GPUs like NVIDIA's A100 or H100.

License

The model is released under the TII Falcon-LLM License 2.0. For full terms and conditions, refer to the license link.

More Related APIs