Llama Guard 3 8 B
meta-llamaIntroduction
Llama Guard 3 is a pretrained model under the Llama-3.1-8B framework, fine-tuned for content safety classification. It is used to classify content in both inputs and responses of large language models (LLMs), indicating whether content is safe or unsafe, specifying violated content categories when unsafe. The model supports multilingual content moderation and is optimized for safety and security in search and code interpreter tool calls.
Architecture
Llama Guard 3 builds upon the Llama-3.1-8B architecture, focusing on content safety with alignment to MLCommons standardized hazards taxonomy. It supports content moderation in eight languages and uses a probability-based system to classify safety categories. The model is designed to improve system-level safety performance when deployed with Llama 3.1.
Training
The training data for Llama Guard 3 consists of English and multilingual data collected from human and synthetically generated sources. It uses prompts from the hh-rlhf dataset and integrates data for new categories, multilingual capabilities, and tool use. The training emphasizes reducing false positive rates by curating benign prompts and responses.
Guide: Running Locally
- Setup Environment: Install PyTorch and the Transformers library.
- Load Model: Use
AutoTokenizer
andAutoModelForCausalLM
from Transformers to load the model. - Device Configuration: Set the device to CUDA for GPU acceleration.
- Run Model: Implement a function to input chat data and generate responses.
- Cloud GPUs: Consider using cloud GPU services like AWS, Google Cloud, or Azure for efficient computation.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "meta-llama/Llama-Guard-3-8B"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map=device)
def moderate(chat):
input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(device)
output = model.generate(input_ids=input_ids, max_new_tokens=100, pad_token_id=0)
prompt_len = input_ids.shape[-1]
return tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
License
Llama Guard 3 is distributed under the LLAMA 3.1 COMMUNITY LICENSE AGREEMENT. The license grants a non-exclusive, worldwide, non-transferable, royalty-free limited license to use, reproduce, distribute, and modify the Llama Materials. Redistribution must include a copy of the license agreement, and attribution to Meta is required. Users with over 700 million monthly active users must request a separate license from Meta. The license includes disclaimers of warranties and limitations of liability.