Llama 3.2 1 B Instruct

meta-llama

Introduction

Llama 3.2 is a collection of multilingual large language models (LLMs) developed by Meta, optimized for text generation tasks such as dialogue, retrieval, and summarization. These models are available in 1B and 3B parameter sizes and are tuned for multilingual applications. They are designed to outperform many existing models in industry benchmarks.

Architecture

Llama 3.2 utilizes an auto-regressive transformer architecture, employing techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences. The models support eight languages, including English, German, French, and Spanish, among others. They use Grouped-Query Attention (GQA) for improved scaling.

Training

The models were pretrained on up to 9 trillion tokens of publicly available data, incorporating logits from larger Llama models. Training involved knowledge distillation, supervised fine-tuning, and preference optimization. Quantization techniques were also applied, using a 4-bit groupwise scheme for weights and 8-bit for activations.

Guide: Running Locally

  1. Install Dependencies: Ensure you have Python and install the latest version of the Transformers library using pip:

    pip install --upgrade transformers
    
  2. Set Up the Model: Use the Transformers pipeline for text generation:

    import torch
    from transformers import pipeline
    
    model_id = "meta-llama/Llama-3.2-1B-Instruct"
    pipe = pipeline(
        "text-generation",
        model=model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )
    
  3. Generate Text: Create and execute the text generation pipeline:

    messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    outputs = pipe(
        messages,
        max_new_tokens=256,
    )
    print(outputs[0]["generated_text"][-1])
    
  4. Hardware Suggestions: For optimal performance, it is recommended to use cloud GPUs such as NVIDIA H100 or similar.

License

Llama 3.2 is distributed under the Llama 3.2 Community License, a custom commercial license agreement that allows non-exclusive, worldwide distribution and modification rights. Users must comply with Meta's Acceptable Use Policy and other terms outlined in the license agreement, which includes attribution requirements and usage restrictions.

More Related APIs in Text Generation