Llama 3.3 70 B Instruct

unsloth

Introduction

Llama-3.3-70B-Instruct is a multilingual large language model (LLM) developed by Meta. It is a pretrained and instruction-tuned generative model optimized for multilingual dialogue and various natural language generation tasks. This model supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Architecture

Llama 3.3 is an auto-regressive language model using an optimized transformer architecture. It employs techniques such as supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model supports tool use formats and can leverage grouped-query attention (GQA) for improved inference scalability.

Training

The Llama 3.3 model was pretrained on approximately 15 trillion tokens of publicly available data. Fine-tuning involved over 25 million synthetically generated examples. The training consumed 39.3 million GPU hours on Meta's custom-built GPU cluster and used H100-80GB hardware. Meta maintains net-zero greenhouse gas emissions in its global operations, ensuring a sustainable training process.

Guide: Running Locally

  1. Install Transformers: Ensure you have transformers version >= 4.43.0. Use pip install --upgrade transformers to update.

  2. Set Up Environment: Use the following snippet for running with PyTorch:

    import transformers
    import torch
    
    model_id = "meta-llama/Llama-3.3-70B-Instruct"
    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )
    
  3. Inference: Define messages and generate outputs:

    messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    outputs = pipeline(messages, max_new_tokens=256)
    print(outputs[0]["generated_text"][-1])
    
  4. Cloud GPUs: For optimal performance, use cloud GPUs like Google Colab's Tesla T4. Utilize provided notebooks for faster fine-tuning and reduced memory usage.

License

Llama 3.3 is distributed under the Llama 3.3 Community License Agreement, which permits commercial and research use within outlined restrictions. The license details are available at Llama License.

More Related APIs in Text Generation