Falcon3 10 B Instruct G G U F

tiiuae

Introduction

The Falcon3-10B-Instruct-GGUF is part of the Falcon3 family of Open Foundation Models, which are large language models (LLMs) ranging from 1 billion to 10 billion parameters. This model achieves state-of-the-art performance in reasoning, language understanding, instruction following, code, and mathematics tasks. It supports English, French, Spanish, and Portuguese, with a context length of up to 32,000 tokens.

Architecture

  • Type: Transformer-based, causal decoder-only architecture.
  • Structure: 40 decoder blocks with Grouped Query Attention (GQA) for faster inference, featuring 12 query heads and 4 key-value heads.
  • Head Dimension: Wider at 256.
  • RoPE Value: High at 1000042, supporting long context understanding.
  • Components: Uses SwiGLu and RMSNorm.
  • Context Length: 32,000.
  • Vocabulary Size: 131,000.
  • Scaling: Depth scaled from Falcon3-7B-Base using 2 Teratokens of diverse datasets.
  • Posttraining: Done on 1.2 million samples covering STEM, conversational, code, safety, and function call data.
  • Languages Supported: English (EN), French (FR), Spanish (ES), Portuguese (PT).

Training

The model is trained using 1024 H100 GPU chips with datasets including web, code, STEM, high-quality, and multilingual data. Posttraining involved an additional 1.2 million samples to enhance its instruction-following capabilities.

Guide: Running Locally

1. Download GGUF Models

Use the Hugging Face Hub to download the model:

pip install huggingface_hub
huggingface-cli download {model_name}

Replace {model_name} with the relevant username and model name from Hugging Face.

2. Install llama.cpp

Options for installation include:

  1. Build from Source:

    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
    cmake -B build
    cmake --build build --config Release
    

    Refer to the llama.cpp build documentation for details.

  2. Pre-built Binaries: Check the llama.cpp repository for available binaries.

  3. Docker: Use the official llama.cpp Docker image, detailed in the docker documentation.

3. Start Using the Model

  • Text Completion:
    llama-cli -m {path-to-gguf-model} -p "I believe the meaning of life is" -n 128
    
  • Conversation Mode:
    llama-cli -m {path-to-gguf-model} -p "You are a helpful assistant" -cnv -co
    

Suggested Cloud GPUs

To optimize performance, consider using cloud GPU services like AWS EC2 with NVIDIA Tesla V100 or A100 instances, Google Cloud Platform's NVIDIA GPUs, or Azure's GPU-accelerated virtual machines.

License

The model is released under the TII Falcon-LLM License 2.0.

More Related APIs in Text Generation