Llama 3.2 3 B

meta-llama

Introduction

Llama 3.2 is a collection of multilingual large language models developed by Meta. It is optimized for text-generation tasks in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The models are available in 1B and 3B sizes and are designed for applications such as dialogue systems, summarization, and agentic retrieval.

Architecture

Llama 3.2 utilizes an optimized transformer architecture in an auto-regressive language model setup. The models are instruction-tuned and use supervised fine-tuning (SFT) along with reinforcement learning with human feedback (RLHF) to ensure alignment with human preferences for helpfulness and safety. The quantization scheme includes 4-bit groupwise quantization for weights and 8-bit dynamic quantization for activations in linear layers, which enhances inference performance and reduces model size.

Training

Llama 3.2 was trained using up to 9 trillion tokens from publicly available data, with a knowledge cutoff of December 2023. The training involved custom GPU clusters and production infrastructure, utilizing 916k GPU hours on H100-80GB hardware. Meta maintains net-zero greenhouse gas emissions, achieving zero market-based greenhouse gas emissions for training. The training process included knowledge distillation, rejection sampling, and direct preference optimization to refine model performance.

Guide: Running Locally

  1. Install Prerequisites:

    • Ensure Python and pip are installed.
    • Install the Hugging Face Transformers library:
      pip install --upgrade transformers
      
  2. Load the Model:

    • Use the Transformers pipeline for text generation:
      import torch
      from transformers import pipeline
      
      model_id = "meta-llama/Llama-3.2-3B"
      pipe = pipeline("text-generation", model=model_id, torch_dtype=torch.bfloat16, device_map="auto")
      result = pipe("The key to life is")
      
  3. Download Checkpoints:

    • Use the Hugging Face CLI to download original checkpoints:
      huggingface-cli download meta-llama/Llama-3.2-3B --include "original/*" --local-dir Llama-3.2-3B
      
  4. Hardware Recommendations:

    • For optimal performance, consider using cloud GPUs such as NVIDIA's A100 or H100.

License

Llama 3.2 is distributed under the Llama 3.2 Community License, which grants a non-exclusive, worldwide, non-transferable, royalty-free license. Users must comply with the Acceptable Use Policy and display appropriate attributions, such as "Built with Llama," when distributing products that incorporate Llama Materials. Commercial terms apply for entities with over 700 million monthly active users. The full license and acceptable use policy can be found on Meta's documentation pages.

More Related APIs in Text Generation