Llama 3.2 1 B

meta-llama

Introduction

The Llama 3.2 collection comprises multilingual large language models (LLMs) developed by Meta. These models, available in 1B and 3B sizes, are designed for multilingual dialogue use cases such as agentic retrieval and summarization. They outperform both open-source and closed chat models in industry benchmarks.

Architecture

Llama 3.2 utilizes an auto-regressive language model built on an optimized transformer architecture. It employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences. The models are designed to support multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Training

The training process for Llama 3.2 involved extensive use of Meta's custom GPU cluster and production infrastructure. The models were pretrained on up to 9 trillion tokens of data from public sources. Knowledge distillation techniques were applied to enhance performance post-pruning. The models underwent several rounds of alignment with techniques like Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) to produce final chat models.

Guide: Running Locally

  1. Install Transformers: Ensure your Transformers library is updated to version 4.43.0 or later. Use the command:
    pip install --upgrade transformers
    
  2. Use the Transformers pipeline: Import the necessary modules and set up the pipeline for text generation.
    import torch
    from transformers import pipeline
    
    model_id = "meta-llama/Llama-3.2-1B"
    pipe = pipeline(
        "text-generation", 
        model=model_id, 
        torch_dtype=torch.bfloat16, 
        device_map="auto"
    )
    output = pipe("The key to life is")
    
  3. Download Original Checkpoints: Use the Hugging Face CLI to download the original model checkpoints if needed:
    huggingface-cli download meta-llama/Llama-3.2-1B --include "original/*" --local-dir Llama-3.2-1B
    
  4. Hardware Recommendations: For optimal performance, consider using cloud GPUs such as NVIDIA V100 or A100 to support model inference and training activities.

License

Llama 3.2 is governed by the Llama 3.2 Community License, which permits non-exclusive, worldwide, non-transferable, and royalty-free use of the Llama Materials. Redistribution requires adherence to specific conditions, including providing copies of the license and displaying the "Built with Llama" notice. Compliance with applicable laws and the Acceptable Use Policy is mandatory. Any commercial use beyond a specified user threshold requires explicit permission from Meta. The license disclaims warranties and limits Meta's liability for certain types of damages. The agreement is governed by California law.

More Related APIs in Text Generation