LLAMA-3.2-1B-INSTRUCT-LLAMAFILE

Introduction

Llama-3.2-1B-Instruct-llamafile is a large language model released by Meta on September 25, 2024. It is designed to run efficiently on most computers with 4GB+ of RAM and is suitable for multilingual dialogue use cases. The model is optimized for both multilingual dialogue and generative tasks.

Architecture

Llama 3.2 is an auto-regressive language model using an optimized transformer architecture. It employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for improved alignment with human preferences. The model supports eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and uses Grouped-Query Attention (GQA) for enhanced inference scalability.

Training

The Llama 3.2 models were pretrained on up to 9 trillion tokens from publicly available sources. The training involved custom libraries and infrastructure, using 916k GPU hours on H100-80GB hardware. The model's training data cutoff is December 2023. Knowledge distillation was utilized to recover performance after pruning, followed by alignment rounds involving Supervised Fine-Tuning, Rejection Sampling, and Direct Preference Optimization.

Guide: Running Locally

Basic Steps

Download the Model:

wget https://huggingface.co/Mozilla/Llama-3.2-1B-Instruct-llamafile/resolve/main/Llama-3.2-1B-Instruct.Q6_K.llamafile

Make the File Executable:

chmod +x Llama-3.2-1B-Instruct.Q6_K.llamafile

Run the Model:
```
./Llama-3.2-1B-Instruct.Q6_K.llamafile
```

Cloud GPUs

For enhanced performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs or Google Cloud's AI Platform.

License

The Llama 3.2 model is governed by the Llama 3.2 Community License, a custom commercial license agreement. Users are encouraged to comply with the Acceptable Use Policy and ensure responsible deployment practices. For more details, refer to the LICENSE.

More Related APIs

Llama 3.2 1 B Instruct llamafile