Falcon3 3 B Instruct G G U F

tiiuae

Introduction

Falcon3-3B-Instruct is part of the Falcon3 family of Open Foundation Models, which includes pretrained and instruction-tuned large language models (LLMs) ranging from 1B to 10B parameters. This model excels in reasoning, language understanding, instruction following, and code and mathematics tasks. It supports English, French, Spanish, and Portuguese, with a context length of up to 32,000 tokens.

Architecture

  • Type: Transformer-based causal decoder-only architecture
  • Structure: 22 decoder blocks
  • Attention: Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads
  • Head Dimension: 256
  • Context Length: 32K
  • Vocabulary Size: 131K
  • Components: SwiGLU and RMSNorm
  • High RoPE Value: 1000042
  • Origin: Pruned and adapted from Falcon3-7B-Base using 100 Gigatokens of diverse datasets
  • Posttraining: 1.2 million samples of STEM, conversational, code, safety, and function call data

Training

The model was developed by the Technology Innovation Institute using 1024 H100 GPU chips. It is designed to handle multilingual data and has been fine-tuned on a variety of domains to ensure robust performance.

Guide: Running Locally

Step 1: Download the Model

First, download the model from Hugging Face using the huggingface_hub library:

pip install huggingface_hub
huggingface-cli download {model_name}

Replace {model_name} with the specific model name.

Step 2: Install llama.cpp

Choose one of the following installation methods:

  1. Build from Source:

    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
    cmake -B build
    cmake --build build --config Release
    
  2. Download Pre-built Binaries: Check the llama.cpp repository for availability.

  3. Use Docker: Refer to the llama.cpp Docker documentation.

Step 3: Run the Model

  • Text Completion:

    llama-cli -m {path-to-gguf-model} -p "I believe the meaning of life is" -n 128
    
  • Conversation Mode:

    llama-cli -m {path-to-gguf-model} -p "You are a helpful assistant" -cnv -co
    

Suggested Cloud GPUs

Consider using cloud-based GPUs for enhanced performance, such as NVIDIA A100 or H100 instances available on major cloud platforms.

License

The Falcon3-3B-Instruct model is released under the TII Falcon-LLM License 2.0. More details can be found here.

More Related APIs