Llama 3.3 70 B Instruct

meta-llama

Introduction

The Meta Llama 3.3 model is a multilingual large language model (LLM) designed for text generation. It has been instruction-tuned and optimized for multilingual dialogue, outperforming many available models on industry benchmarks. It supports eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Architecture

Llama 3.3 utilizes an auto-regressive language model with an optimized transformer architecture. The model incorporates supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Training

The model was pretrained on approximately 15 trillion tokens from publicly available sources and fine-tuned using over 25 million synthetically generated examples. Training involved 39.3 million GPU hours on H100-80GB hardware, with estimated total greenhouse gas emissions of 11,390 tons CO2eq.

Model Stats Number

  • Model Size: 70 billion parameters
  • Token Count: Over 15 trillion tokens
  • Context Length: 128k
  • Supported Languages: 8 (English, German, French, Italian, Portuguese, Hindi, Spanish, Thai)
  • Training Data Cutoff: December 2023
  • Release Date: December 6, 2024

Guide: Running Locally

To run Llama 3.3 locally, follow these steps:

  1. Install Transformers: Make sure you have the latest version of the Transformers library (>=4.45.0) by running pip install --upgrade transformers.
  2. Set Up Model: Use the Transformers pipeline or the Auto classes with the generate() function.
  3. Example Code:
    import transformers
    
    model_id = "meta-llama/Llama-3.3-70B-Instruct"
    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )
    
    messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    
    outputs = pipeline(messages, max_new_tokens=256)
    print(outputs[0]["generated_text"])
    
  4. Cloud GPUs: For optimal performance, consider using cloud GPUs such as those available from AWS, Google Cloud, or Azure.

License

Llama 3.3 is available under the Llama 3.3 Community License Agreement. It grants a non-exclusive, worldwide, non-transferable, and royalty-free limited license to use, reproduce, and modify the Llama Materials. Redistribution and use must adhere to specific guidelines, including displaying “Built with Llama” and following the Acceptable Use Policy. For more details, refer to the license documentation.

More Related APIs in Text Generation