Llama Guard 3 8 B

meta-llama

Introduction

Llama Guard 3 is a pretrained model under the Llama-3.1-8B framework, fine-tuned for content safety classification. It is used to classify content in both inputs and responses of large language models (LLMs), indicating whether content is safe or unsafe, specifying violated content categories when unsafe. The model supports multilingual content moderation and is optimized for safety and security in search and code interpreter tool calls.

Architecture

Llama Guard 3 builds upon the Llama-3.1-8B architecture, focusing on content safety with alignment to MLCommons standardized hazards taxonomy. It supports content moderation in eight languages and uses a probability-based system to classify safety categories. The model is designed to improve system-level safety performance when deployed with Llama 3.1.

Training

The training data for Llama Guard 3 consists of English and multilingual data collected from human and synthetically generated sources. It uses prompts from the hh-rlhf dataset and integrates data for new categories, multilingual capabilities, and tool use. The training emphasizes reducing false positive rates by curating benign prompts and responses.

Guide: Running Locally

  1. Setup Environment: Install PyTorch and the Transformers library.
  2. Load Model: Use AutoTokenizer and AutoModelForCausalLM from Transformers to load the model.
  3. Device Configuration: Set the device to CUDA for GPU acceleration.
  4. Run Model: Implement a function to input chat data and generate responses.
  5. Cloud GPUs: Consider using cloud GPU services like AWS, Google Cloud, or Azure for efficient computation.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Llama-Guard-3-8B"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map=device)

def moderate(chat):
    input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(device)
    output = model.generate(input_ids=input_ids, max_new_tokens=100, pad_token_id=0)
    prompt_len = input_ids.shape[-1]
    return tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)

License

Llama Guard 3 is distributed under the LLAMA 3.1 COMMUNITY LICENSE AGREEMENT. The license grants a non-exclusive, worldwide, non-transferable, royalty-free limited license to use, reproduce, distribute, and modify the Llama Materials. Redistribution must include a copy of the license agreement, and attribution to Meta is required. Users with over 700 million monthly active users must request a separate license from Meta. The license includes disclaimers of warranties and limitations of liability.

More Related APIs in Text Generation