Falcon3 7 B Instruct
tiiuaeIntroduction
Falcon3-7B-Instruct is part of the Falcon3 family of Open Foundation Models, designed for tasks in reasoning, language understanding, instruction following, coding, and mathematics. This model supports English, French, Spanish, and Portuguese, with a context length capacity of up to 32,000 tokens. Developed by the Technology Innovation Institute, it aims to deliver state-of-the-art performance across various domains.
Architecture
- Model Type: Transformer-based causal decoder-only architecture
- Decoder Blocks: 28
- Attention Mechanism: Grouped query attention (GQA) with 12 query heads and 4 key-value heads
- Head Dimension: 256
- Context Support: High RoPE value of 1,000,042 for long context understanding
- Additional Features: SwiGLU, RMSNorm
- Context Length: 32,000 tokens
- Vocabulary Size: 131,000 tokens
Training
- Pretraining: Conducted on 14 teratokens from diverse datasets including web, code, STEM, high-quality, and multilingual data using 1,024 H100 GPU chips.
- Post-training: Involved 1.2 million samples covering STEM, conversations, code, safety, and function calls.
Guide: Running Locally
To run the Falcon3-7B-Instruct model locally, follow these steps:
-
Install Dependencies:
- Ensure you have Python and PyTorch installed.
- Install the Transformers library:
pip install transformers
.
-
Load the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "tiiuae/Falcon3-7B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name)
-
Use the Model:
- Define your prompt and messages.
- Tokenize and generate a response with the model.
-
Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for access to GPUs, as the model is computationally intensive.
License
The Falcon3-7B-Instruct is licensed under the TII Falcon-LLM License 2.0. For more details, refer to the license terms.