Llama 3.3 70 B Instruct
meta-llamaIntroduction
The Meta Llama 3.3 model is a multilingual large language model (LLM) designed for text generation. It has been instruction-tuned and optimized for multilingual dialogue, outperforming many available models on industry benchmarks. It supports eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Architecture
Llama 3.3 utilizes an auto-regressive language model with an optimized transformer architecture. The model incorporates supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training
The model was pretrained on approximately 15 trillion tokens from publicly available sources and fine-tuned using over 25 million synthetically generated examples. Training involved 39.3 million GPU hours on H100-80GB hardware, with estimated total greenhouse gas emissions of 11,390 tons CO2eq.
Model Stats Number
- Model Size: 70 billion parameters
- Token Count: Over 15 trillion tokens
- Context Length: 128k
- Supported Languages: 8 (English, German, French, Italian, Portuguese, Hindi, Spanish, Thai)
- Training Data Cutoff: December 2023
- Release Date: December 6, 2024
Guide: Running Locally
To run Llama 3.3 locally, follow these steps:
- Install Transformers: Make sure you have the latest version of the Transformers library (
>=4.45.0
) by runningpip install --upgrade transformers
. - Set Up Model: Use the Transformers pipeline or the Auto classes with the
generate()
function. - Example Code:
import transformers model_id = "meta-llama/Llama-3.3-70B-Instruct" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipeline(messages, max_new_tokens=256) print(outputs[0]["generated_text"])
- Cloud GPUs: For optimal performance, consider using cloud GPUs such as those available from AWS, Google Cloud, or Azure.
License
Llama 3.3 is available under the Llama 3.3 Community License Agreement. It grants a non-exclusive, worldwide, non-transferable, and royalty-free limited license to use, reproduce, and modify the Llama Materials. Redistribution and use must adhere to specific guidelines, including displaying “Built with Llama” and following the Acceptable Use Policy. For more details, refer to the license documentation.