Falcon3 10 B Instruct G G U F

QuantFactory

Introduction

Falcon3-10B-Instruct-GGUF is a quantized version of the Falcon3-10B-Instruct model, designed for high performance in tasks like reasoning, language understanding, instruction following, and more. It supports English, French, Spanish, and Portuguese, with a context length of up to 32K. The model achieves notable benchmarks across various datasets and tasks.

Architecture

  • Type: Transformer-based causal decoder-only architecture
  • Components: 40 decoder blocks
  • Attention Mechanism: Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads
  • Head Dimension: 256
  • Context Length: Up to 32K
  • Vocabulary Size: 131K
  • Special Features: High RoPE value, SwiGLu, and RMSNorm

Training

The Falcon3-10B-Instruct model is trained on a vast dataset, including 2 Teratokens of web, code, STEM, and multilingual data using 1024 H100 GPU chips. It is post-trained on 1.2 million samples of STEM, conversational, code, safety, and function call data. The model is developed by the Technology Innovation Institute.

Guide: Running Locally

  1. Install Transformers Library: Ensure you have the transformers library installed in your Python environment.
  2. Load Model and Tokenizer:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "tiiuae/Falcon3-10B-Instruct"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Prepare Input: Define your prompt and messages.
  4. Generate Output:
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    generated_ids = model.generate(**model_inputs, max_new_tokens=1024)
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(response)
    
  5. Hardware Recommendation: For optimal performance, consider using cloud GPUs like NVIDIA's A100 or H100.

License

The model is released under the TII Falcon-LLM License 2.0. For full terms and conditions, refer to the license link.

More Related APIs