falcon mamba 7b
tiiuaeIntroduction
The Falcon-Mamba-7B is a causal decoder-only language model developed by the Technology Innovation Institute. Designed for text generation tasks, it primarily supports the English language and utilizes the specialized Mamba architecture.
Architecture
Falcon-Mamba-7B is based on the Mamba architecture, using a causal language modeling objective to predict the next token in a sequence. The model comprises 64 layers, with a hidden dimension (d_model
) of 4096 and a state dimension (d_state
) of 16, supporting a vocabulary size of 65,024 tokens. The sequence length during training is extended up to 8,192 tokens.
Training
Falcon-Mamba-7B was trained using approximately 5,500 gigabytes of data from the Refined-Web dataset, which is large-scale, filtered, and deduplicated. The training utilized a curriculum learning approach over multiple stages and was conducted on AWS SageMaker using 256 H100 80GB GPUs. Training involved a 3D parallelism strategy and the AdamW optimizer with a warmup-stable-decay learning rate schedule, executing over two months.
Guide: Running Locally
To run Falcon-Mamba-7B locally:
-
Install Dependencies:
- Ensure you have Python installed.
- Install the necessary libraries using pip:
pip install transformers accelerate
-
Load the Model:
- Use the following Python script to load and run the model:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b") model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
- Use the following Python script to load and run the model:
-
Use Cloud GPUs:
- For improved performance, especially with larger models, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.
License
Falcon-Mamba-7B is distributed under the TII Falcon-Mamba License 2.0. For detailed terms and conditions, refer to the license document.