Falcon3 3 B Instruct

tiiuae

Introduction

Falcon3-3B-Instruct is part of the Falcon3 family of large language models (LLMs) developed by the Technology Innovation Institute. It is designed for tasks involving reasoning, language understanding, instruction following, code, and mathematics. The model supports four languages—English, French, Spanish, and Portuguese—and can handle a context length of up to 32K tokens.

Architecture

  • Type: Transformer-based causal decoder-only architecture
  • Decoder Blocks: 22
  • Attention Mechanism: Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads
  • Head Dimension: 256
  • RoPE Value: 1000042 for enhanced context understanding
  • Other Features: Uses SwiGLU and RMSNorm
  • Context Length: 32K
  • Vocabulary Size: 131K tokens

Training

Falcon3-3B-Instruct is pruned and refined from Falcon3-7B-Base using 100 Gigatokens of diverse datasets, including web, code, STEM, and multilingual data. Training utilized 1024 H100 GPU chips. The model underwent post-training on 1.2 million samples covering STEM, conversational, code, safety, and function call data.

Guide: Running Locally

  1. Installation: Ensure you have the transformers library installed.
    pip install transformers
    
  2. Load the Model:
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    model_name = "tiiuae/Falcon3-3B-Instruct"
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype="auto",
        device_map="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Generate Text:
    prompt = "How many hours in one day?"
    messages = [
        {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=1024
    )
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(response)
    
  4. Cloud GPUs: For optimal performance, consider using cloud GPUs such as AWS EC2 P3 instances or Google Cloud's GPU offerings.

License

Falcon3-3B-Instruct is released under the TII Falcon-LLM License 2.0. For detailed terms and conditions, refer to the license documentation.

More Related APIs in Text Generation