Meta Llama 3.1 8 B Instruct G G U F
bartowskiMeta-Llama-3.1-8B-Instruct-GGUF
Introduction
Meta-Llama-3.1-8B-Instruct-GGUF is a quantized version of Meta's Llama 3.1 model, designed for text generation across eight languages. This model is optimized for efficient inference and deployment in various environments.
Architecture
The Meta-Llama-3.1-8B-Instruct is built on PyTorch and supports the GGUF model library, making it suitable for conversational AI tasks. It includes quantizations using the Imatrix option, ensuring a balance between performance and resource usage.
Training
The model was quantized using llama.cpp
with datasets from bartowski's GitHub Gist. This allows the model to run efficiently on different hardware configurations, including ARM chips and AVX2/AVX512 CPUs.
Guide: Running Locally
-
Install Dependencies: Ensure you have
huggingface_hub
CLI installed:pip install -U "huggingface_hub[cli]"
-
Download Model: Use the CLI to download the desired quantized model:
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF --include "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf" --local-dir ./
-
Configuration: Choose a quantized model file that fits your hardware capabilities. For cloud GPUs, consider using providers like AWS, Google Cloud, or Azure for better performance.
License
The Meta-Llama-3.1-8B-Instruct-GGUF model is distributed under the Llama 3.1 Community License, which allows for non-exclusive, royalty-free use, reproduction, and distribution. Redistribution requires providing a copy of the license and appropriate attribution. For commercial use, a separate license may be necessary if the application exceeds certain user thresholds. Full details are available in the license agreement.