Falcon3 7 B Instruct G G U F

tiiuae

Introduction

The Falcon3-7B-Instruct-GGUF is part of the Falcon3 family of Open Foundation Models, offering state-of-the-art performance in reasoning, language understanding, and other tasks. It supports four languages: English, French, Spanish, and Portuguese, allowing a context length of up to 32K. This model is designed for text generation and is fine-tuned with GGUF instructions.

Architecture

  • Type: Transformer-based causal decoder-only architecture
  • Structure: 28 decoder blocks
  • Attention: Grouped Query Attention with 12 query heads and 4 key-value heads for faster inference
  • Head Dimension: 256 (wider head dimension)
  • RoPE Value: 1000042, allowing long context understanding
  • Techniques: Uses SwiGLU and RMSNorm
  • Context Length: 32K
  • Vocabulary Size: 131K
  • Pretraining: Leveraged 14 Teratokens of diverse datasets using 1024 H100 GPU chips
  • Posttraining: Trained on 1.2 million samples across various domains

Training

Falcon3-7B-Instruct was pretrained on extensive datasets comprising web, code, STEM, and multilingual data and posttrained on focused datasets including STEM and conversational data. This comprehensive training approach enables the model to excel in various complex tasks.

Guide: Running Locally

  1. Download the Model:

    • Use the huggingface_hub library to download the model:
      pip install huggingface_hub
      huggingface-cli download {model_name}
      
    • Replace {model_name} with the actual model name from Hugging Face.
  2. Install llama.cpp:

    • Build from Source:
      git clone https://github.com/ggerganov/llama.cpp
      cd llama.cpp
      cmake -B build
      cmake --build build --config Release
      
    • Download Pre-built Binaries: Available in the llama.cpp repository.
    • Use Docker: Follow the llama.cpp Docker documentation for setup.
  3. Run the Model:

    • For text completion:
      llama-cli -m {path-to-gguf-model} -p "I believe the meaning of life is" -n 128
      
    • For conversation mode:
      llama-cli -m {path-to-gguf-model} -p "You are a helpful assistant" -cnv -co
      

Cloud GPUs: Consider using cloud GPU services for more efficient model inference, especially for large models like Falcon3-7B-Instruct.

License

The Falcon3-7B-Instruct-GGUF model is released under the TII Falcon-LLM License 2.0, developed by the Technology Innovation Institute.

More Related APIs in Text Generation