Llama 3.3 70 B Instruct G G U F

unsloth

Introduction

Llama 3.3-70B-Instruct-GGUF is a multilingual large language model developed by Meta. It is pretrained and instruction-tuned for multilingual dialogue use and surpasses many open-source and closed chat models in industry benchmarks. The model is designed to be used in conversational and instructional contexts.

Architecture

The Llama 3.3 model is an auto-regressive language model using an optimized transformer architecture. It includes supervised fine-tuning and reinforcement learning with human feedback to align the model with human preferences. The model supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Training

Llama 3.3 was pretrained on approximately 15 trillion tokens from publicly available sources and fine-tuned with additional instruction datasets and synthetically generated examples. The training consumed 39.3 million GPU hours using Meta's custom-built GPU cluster. Meta maintains net-zero greenhouse gas emissions, making the market-based emissions for training negligible.

Guide: Running Locally

  1. Install Dependencies: Ensure you have Python and pip installed. Use pip install --upgrade transformers torch to update necessary libraries.
  2. Model Setup: Use the transformers library to load and run the model. Here's a code snippet to get started:
    import transformers
    import torch
    
    model_id = "meta-llama/Llama-3.3-70B-Instruct"
    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )
    messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    outputs = pipeline(
        messages,
        max_new_tokens=256,
    )
    print(outputs[0]["generated_text"][-1])
    
  3. Using Cloud GPUs: For better performance, use cloud services like Google Colab with a Tesla T4 GPU. Example notebooks for various Llama models are available and can be used to start quickly.

License

The Llama 3.3 model is distributed under the Llama 3.3 Community License Agreement. You can view the full license here. The license permits commercial and research use, provided that users comply with the terms stated in the license agreement.

More Related APIs