falcon mamba 7b LLM Model

Introduction

The Falcon-Mamba-7B is a causal decoder-only language model developed by the Technology Innovation Institute. Designed for text generation tasks, it primarily supports the English language and utilizes the specialized Mamba architecture.

Architecture

Falcon-Mamba-7B is based on the Mamba architecture, using a causal language modeling objective to predict the next token in a sequence. The model comprises 64 layers, with a hidden dimension (d_model) of 4096 and a state dimension (d_state) of 16, supporting a vocabulary size of 65,024 tokens. The sequence length during training is extended up to 8,192 tokens.

Training

Falcon-Mamba-7B was trained using approximately 5,500 gigabytes of data from the Refined-Web dataset, which is large-scale, filtered, and deduplicated. The training utilized a curriculum learning approach over multiple stages and was conducted on AWS SageMaker using 256 H100 80GB GPUs. Training involved a 3D parallelism strategy and the AdamW optimizer with a warmup-stable-decay learning rate schedule, executing over two months.

Guide: Running Locally

To run Falcon-Mamba-7B locally:

Install Dependencies:
- Ensure you have Python installed.
- Install the necessary libraries using pip:
```
pip install transformers accelerate
```

Load the Model:

Use the following Python script to load and run the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b")
input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Use Cloud GPUs:
- For improved performance, especially with larger models, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

Falcon-Mamba-7B is distributed under the TII Falcon-Mamba License 2.0. For detailed terms and conditions, refer to the license document.

More Related APIs in Text Generation