Falcon3 Mamba 7 B Base

tiiuae

Introduction

The Falcon3-Mamba-7B-Base is part of the Falcon3 family of Open Foundation Models, known for achieving state-of-the-art results at the time of release in tasks like reasoning, language understanding, instruction following, code, and mathematics. It supports a context length of up to 32K and is primarily trained on an English corpus.

Architecture

  • Base Model: Mamba1-based causal decoder-only architecture.
  • Task: Trained on a causal language modeling task to predict the next token.
  • Components:
    • 64 decoder blocks
    • Width: 4096
    • State dimension: 16
    • Context length: 32K
    • Vocabulary size: 65K
  • Training Data: Continued pretraining from Falcon-Mamba-7b with 1500 gigatokens of web, code, STEM, and high-quality data.
  • Posttraining: On 1.2 million samples including STEM, conversations, code, and safety-related data.

Training

The model was developed by the Technology Innovation Institute and was released in December 2024. It uses additional training and posttraining data to enhance its capabilities in various domains.

Guide: Running Locally

  1. Install Dependencies: Ensure that you have the transformers library installed.
  2. Load the Model and Tokenizer:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "tiiuae/Falcon3-Mamba-7B-Base"
    
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype="auto",
        device_map="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Generate Text:
    prompt = "How many hours in one day?"
    messages = [
        {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=1024
    )
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(response)
    
  4. Hardware Recommendations: Utilize cloud GPUs for optimal performance, such as those provided by AWS, Google Cloud, or Azure.

License

The Falcon3-Mamba-7B-Base is released under the TII Falcon-LLM License 2.0. For more details, refer to Falcon LLM Terms and Conditions.

More Related APIs in Text Generation