Llama 2 70b chat hf
meta-llamaLLAMA-2-70B-CHAT-HF
Introduction
Llama 2 is a collection of pretrained and fine-tuned generative text models developed by Meta, ranging from 7 billion to 70 billion parameters. The Llama-2-70B-Chat model is optimized for dialogue use cases and is available in the Hugging Face Transformers format. It is designed for commercial and research use in English, offering assistant-like chat capabilities.
Architecture
Llama 2 is an auto-regressive language model with an optimized transformer architecture. It is trained using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model includes variations with 7B, 13B, and 70B parameters and is fine-tuned for specific tasks.
Training
The Llama 2 models were pretrained on 2 trillion tokens from publicly available sources, with fine-tuning involving over one million new human-annotated examples. The training utilized Meta's Research Super Cluster and third-party cloud compute, with a cumulative 3.3 million GPU hours on A100-80GB hardware. The estimated carbon emissions were offset by Meta's sustainability program.
Guide: Running Locally
-
Access and License: Visit Meta's website to accept the license agreement. This is required to download the model weights and tokenizer.
-
Environment Setup: Ensure you have Python and the necessary libraries installed. You can use
pip
to installtransformers
andtorch
.pip install transformers torch
-
Download the Model: Use the Hugging Face Transformers library to download and load the model.
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-70b-chat-hf") model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-70b-chat-hf")
-
Inference: Input and generate text using the model.
inputs = tokenizer("Hello, how can I assist you today?", return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0]))
-
Cloud GPU Suggestion: For optimal performance, especially with larger models like Llama-2-70B, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The use of Llama 2 is governed by the LLAMA 2 Community License Agreement, which grants a non-exclusive, worldwide, non-transferable, and royalty-free limited license. Redistribution must include the license agreement, and usage must comply with the Acceptable Use Policy and applicable laws. The license prohibits using Llama 2 outputs to improve other large language models.