multi token prediction
facebookIntroduction
The Multi-Token Prediction project by Meta introduces models designed to improve the efficiency and performance of large language models by predicting multiple tokens simultaneously. This approach, detailed in the research paper titled "Better & Faster Large Language Models via Multi-token Prediction," presents an alternative to traditional single-token prediction methods.
Architecture
The project includes several models, each with 7 billion parameters, trained on substantial datasets of code tokens:
- Baseline model (n=1) on 200 billion tokens.
- Multi-token prediction model (n=4) on 200 billion tokens.
- Baseline model (n=1) on 1 trillion tokens.
- Multi-token prediction model (n=4) on 1 trillion tokens.
These models utilize a standard Llama 2 SentencePiece tokenizer for processing input data. The Pytorch state_dicts used are compatible with the Llama format, offering flexibility for standard autoregressive inference when required.
Training
The models are trained with a focus on both single and multi-token prediction capabilities, allowing users to experiment with different configurations to optimize performance based on their specific tasks. The training process leverages large datasets to ensure robustness and accuracy across various applications.
Guide: Running Locally
To run these models locally, follow these steps:
- Installation: Install necessary Python packages:
pip install torch fairscale fire sentencepiece
- Execution: Use the following command to run a sample completion task:
Replacetorchrun --nproc_per_node 1 example_completion.py --ckpt_dir 7B_200B_4/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 2
7B_200B_4
with the path to your chosen model checkpoint directory.
For optimal performance, especially with larger models, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
The models and accompanying materials are released under the Multi-token Prediction Research License. This license permits non-commercial research use, including reproduction, distribution, and modification of the materials, provided that these are not used for commercial advantage. Compliance with applicable laws and adherence to the LLaMA Acceptable Use Policy are mandatory. The license includes limitations on liability and a disclaimer of warranty, emphasizing that the materials are provided "as is" without express or implied warranties.