Falcon3 10 B Base

tiiuae

Introduction

The Falcon3-10B-Base model is part of the Falcon3 family of Open Foundation Models, developed by the Technology Innovation Institute. These models are designed for tasks such as reasoning, language understanding, instruction following, code, and mathematics. The model supports English, French, Spanish, and Portuguese, offering a context length of up to 32K. This base model is pretrained and requires further fine-tuning for specific use cases.

Architecture

  • Transformer-based Architecture: Utilizes a causal decoder-only structure with 40 decoder blocks.
  • Grouped Query Attention (GQA): Comprises 12 query heads and 4 key-value heads for efficient inference.
  • Wider Head Dimension: 256 dimensions.
  • High RoPE Value: Supports long-context understanding with a value of 1000042.
  • Uses SwiGLu and RMSNorm: For normalization and activation.
  • Context Length: 32K.
  • Vocabulary Size: 131K.
  • Pretraining: Enhanced from Falcon3-7B-Base using 2 Teratokens of diverse datasets with 1024 H100 GPU chips.

Training

The model was pretrained using a wide array of datasets, including web, code, STEM, and multilingual data. This extensive training was conducted using a significant computational resource involving 1024 H100 GPU chips.

Guide: Running Locally

To run the Falcon3-10B-Base model locally, follow these steps:

  1. Install Required Libraries: Ensure you have transformers and torch installed.

    pip install transformers torch
    
  2. Load the Model: Use the Hugging Face Transformers library to load the model with the following code:

    import torch
    from transformers import pipeline
    
    pipe = pipeline(
        "text-generation", 
        model="tiiuae/Falcon3-10B-Base", 
        torch_dtype=torch.bfloat16, 
        device_map="auto"
    )
    response = pipe("Question: How many hours in one day? Answer: ")
    print(response[0]['generated_text'])
    
  3. Suggested Hardware: For optimal performance, consider using cloud-based GPUs, such as NVIDIA's A100 or H100, available on platforms like AWS, Google Cloud, or Azure.

License

The Falcon3-10B-Base model is released under the TII Falcon-LLM License 2.0. For more details, visit the license terms and conditions.

More Related APIs in Text Generation