Llama 3.2 1 B Instruct llamafile
MozillaLLAMA-3.2-1B-INSTRUCT-LLAMAFILE
Introduction
Llama-3.2-1B-Instruct-llamafile is a large language model released by Meta on September 25, 2024. It is designed to run efficiently on most computers with 4GB+ of RAM and is suitable for multilingual dialogue use cases. The model is optimized for both multilingual dialogue and generative tasks.
Architecture
Llama 3.2 is an auto-regressive language model using an optimized transformer architecture. It employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for improved alignment with human preferences. The model supports eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and uses Grouped-Query Attention (GQA) for enhanced inference scalability.
Training
The Llama 3.2 models were pretrained on up to 9 trillion tokens from publicly available sources. The training involved custom libraries and infrastructure, using 916k GPU hours on H100-80GB hardware. The model's training data cutoff is December 2023. Knowledge distillation was utilized to recover performance after pruning, followed by alignment rounds involving Supervised Fine-Tuning, Rejection Sampling, and Direct Preference Optimization.
Guide: Running Locally
Basic Steps
- Download the Model:
wget https://huggingface.co/Mozilla/Llama-3.2-1B-Instruct-llamafile/resolve/main/Llama-3.2-1B-Instruct.Q6_K.llamafile
- Make the File Executable:
chmod +x Llama-3.2-1B-Instruct.Q6_K.llamafile
- Run the Model:
./Llama-3.2-1B-Instruct.Q6_K.llamafile
Cloud GPUs
For enhanced performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs or Google Cloud's AI Platform.
License
The Llama 3.2 model is governed by the Llama 3.2 Community License, a custom commercial license agreement. Users are encouraged to comply with the Acceptable Use Policy and ensure responsible deployment practices. For more details, refer to the LICENSE.