falcon mamba 7b

tiiuae

Introduction

The Falcon-Mamba-7B is a causal decoder-only language model developed by the Technology Innovation Institute. Designed for text generation tasks, it primarily supports the English language and utilizes the specialized Mamba architecture.

Architecture

Falcon-Mamba-7B is based on the Mamba architecture, using a causal language modeling objective to predict the next token in a sequence. The model comprises 64 layers, with a hidden dimension (d_model) of 4096 and a state dimension (d_state) of 16, supporting a vocabulary size of 65,024 tokens. The sequence length during training is extended up to 8,192 tokens.

Training

Falcon-Mamba-7B was trained using approximately 5,500 gigabytes of data from the Refined-Web dataset, which is large-scale, filtered, and deduplicated. The training utilized a curriculum learning approach over multiple stages and was conducted on AWS SageMaker using 256 H100 80GB GPUs. Training involved a 3D parallelism strategy and the AdamW optimizer with a warmup-stable-decay learning rate schedule, executing over two months.

Guide: Running Locally

To run Falcon-Mamba-7B locally:

  1. Install Dependencies:

    • Ensure you have Python installed.
    • Install the necessary libraries using pip:
      pip install transformers accelerate
      
  2. Load the Model:

    • Use the following Python script to load and run the model:
      from transformers import AutoTokenizer, AutoModelForCausalLM
      tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
      model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b")
      input_text = "Question: How many hours in one day? Answer: "
      input_ids = tokenizer(input_text, return_tensors="pt").input_ids
      outputs = model.generate(input_ids)
      print(tokenizer.decode(outputs[0]))
      
  3. Use Cloud GPUs:

    • For improved performance, especially with larger models, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

Falcon-Mamba-7B is distributed under the TII Falcon-Mamba License 2.0. For detailed terms and conditions, refer to the license document.

More Related APIs in Text Generation