Falcon3-7B-Base

Introduction

Falcon3-7B-Base is part of the Falcon3 family of Open Foundation Models, consisting of pretrained and instructed large language models (LLMs) ranging from 1 billion to 10 billion parameters. The model is designed to deliver state-of-the-art results in reasoning, language understanding, instruction following, and coding tasks. It supports English, French, Spanish, and Portuguese with a context length of up to 32,000 tokens. This is a raw, pretrained model and requires finetuning for most use cases.

Architecture

Type: Transformer-based causal decoder-only architecture
Blocks: 28 decoder blocks
Attention: Grouped query attention (GQA) with 12 query heads and 4 key-value (KV) heads
Head Dimension: 256 (wider)
RoPE Value: High RoPE value of 1000042 for long context understanding
Context Length: 32,000 tokens
Vocabulary Size: 131,000 tokens
Training: Pretrained on 14 Teratokens of diverse datasets using 1024 H100 GPU chips
Languages Supported: English, French, Spanish, Portuguese

Training

The model was pretrained on a massive dataset of 14 Teratokens, comprising web data, code, STEM, high-quality, and multilingual data. It utilizes 1024 H100 GPU chips for training.

Guide: Running Locally

To run Falcon3-7B-Base locally:

Setup: Install PyTorch and the transformers library.

Code: Use the following code snippet to initiate the model:

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation", 
    model="tiiuae/Falcon3-7B-Base", 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)
response = pipe("Question: How many hours in one day? Answer: ")
print(response[0]['generated_text'])

Hardware Recommendation: For optimal performance, consider using cloud GPUs such as NVIDIA A100 or H100.

License

Falcon3-7B-Base is licensed under the TII Falcon-LLM License 2.0. For more details, visit Falcon LLM Terms and Conditions.