Falcon3 10 B Instruct

tiiuae

Introduction

The Falcon3-10B-Instruct is part of the Falcon3 family of Open Foundation Models, developed by the Technology Innovation Institute. It features advanced capabilities in reasoning, language understanding, instruction following, code, and mathematics tasks. The model supports English, French, Spanish, and Portuguese with a context length of up to 32K.

Architecture

  • Transformer-based causal decoder-only model.
  • Comprises 40 decoder blocks with Grouped Query Attention (GQA) for faster inference.
  • Features include 12 query heads, 4 key-value heads, a wider head dimension of 256, and a high RoPE value for long context understanding.
  • Utilizes SwiGLu and RMSNorm with a 131K vocabulary size.
  • Enhanced from Falcon3-7B-Base using 2 Teratokens from diverse datasets, trained with 1024 H100 GPU chips.
  • Posttrained on 1.2 million samples, encompassing STEM, conversational, code, safety, and function call data.

Training

The model was trained using a wide range of datasets, focusing on high-quality multilingual data. Posttraining involved substantial samples from STEM and various instructional data to enhance its predictive and reasoning capabilities.

Guide: Running Locally

  1. Install Required Libraries:

    pip install transformers torch
    
  2. Load the Model and Tokenizer:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "tiiuae/Falcon3-10B-Instruct"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Generate Text:

    prompt = "How many hours in one day?"
    messages = [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    
    generated_ids = model.generate(**model_inputs, max_new_tokens=1024)
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(response)
    
  4. Suggest Cloud GPUs: Utilizing a cloud service with GPUs such as NVIDIA A100 or H100 is recommended for optimal performance.

License

The model is licensed under the TII Falcon-LLM License 2.0. For more details, visit the license terms.

More Related APIs in Text Generation