Llama 2 Technical Documentation Summary

Introduction

Llama 2 is a series of generative text models developed by Meta, ranging from 7 billion to 70 billion parameters. These models are optimized for various natural language generation tasks, with the fine-tuned versions specifically designed for chat applications. They are available for both research and commercial use in English.

Architecture

Llama 2 models use an auto-regressive transformer architecture optimized through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). The 70 billion parameter version also employs Grouped-Query Attention (GQA) for enhanced scalability in inference.

Training

Llama 2 was pretrained on 2 trillion tokens from publicly available sources, with additional fine-tuning on over a million human-annotated examples. The model training utilized Meta's Research Super Cluster and third-party cloud compute, consuming 3.3 million GPU hours, with carbon emissions fully offset by Meta's sustainability program.

Guide: Running Locally

Setup Environment: Ensure you have Python and PyTorch installed. Use a virtual environment for isolation.
Install Dependencies: Use the Hugging Face Transformers library. Install it via pip install transformers.
Download Model: Access the model from Hugging Face after accepting the license.
Run Inference: Load the model using Transformers and run text generation tasks.
Utilize Cloud GPUs: For optimal performance, consider using cloud-based GPUs like AWS EC2, Google Cloud, or Azure.

License

Llama 2 is released under a custom commercial license by Meta. Users must agree to the terms, which include restrictions on use, distribution, and modification. The license stipulates compliance with applicable laws and prohibits using the model to improve competing language models. For full terms, visit Meta's licensing page.

More Related APIs in Text Generation

Llama 2 7b hf

Llama 2 Technical Documentation Summary

Introduction

Architecture

Training

Guide: Running Locally

License