Llama 3.3 70 B Instruct G G U F

bartowski

Introduction

LLAMA-3.3-70B-INSTRUCT-GGUF is a large language model designed for text generation, developed using Meta's Llama framework. It supports 8 languages and offers various quantization formats for optimal performance across different hardware setups.

Architecture

The model is based on Meta's Llama 3.3 architecture, incorporating machine-learning model code, trained model weights, inference-enabling code, and training-enabling code. It uses the GGUF library and can be deployed via inference endpoints compatible with iMatrix.

Training

The original model was trained using llama.cpp for quantization, specifically the imatrix option. This process involves reducing model size while maintaining performance, allowing deployment on resource-constrained environments.

Guide: Running Locally

  1. Install Hugging Face CLI:
    Use the command pip install -U "huggingface_hub[cli]" to install the necessary tools for downloading the model.

  2. Download the Model:
    Use huggingface-cli download bartowski/Llama-3.3-70B-Instruct-GGUF --include "filename" --local-dir ./ to download specific quantized versions of the model.

  3. Choose the Right Quantization:

    • Consider your hardware's RAM or VRAM capacity when selecting quantization.
    • Use I-quants for better performance in size-constrained scenarios, but be aware of compatibility issues with certain hardware and software builds.
  4. Run with Optimal Hardware:

    • Cloud GPUs: Consider cloud services with NVIDIA or AMD GPUs for optimal performance, leveraging GPU VRAM to fit the entire model.

License

The model is distributed under the Llama 3.3 Community License. Users are granted a non-exclusive, worldwide, non-transferable, and royalty-free limited license to use, reproduce, distribute, and modify the model. Compliance with Meta's Acceptable Use Policy is mandatory, and commercial use may require additional licensing.

More Related APIs in Text Generation