Llama 2 13b hf
meta-llamaIntroduction
Llama 2 is a suite of large language models (LLMs) developed by Meta, available in sizes ranging from 7 billion to 70 billion parameters. These models are designed for generative text tasks and include both pretrained and fine-tuned versions optimized for dialogue.
Architecture
Llama 2 models utilize an auto-regressive transformer architecture. The fine-tuned versions, known as Llama-2-Chat, employ supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance their alignment with human preferences for helpfulness and safety.
Training
The Llama 2 models were pretrained on 2 trillion tokens of publicly available data and fine-tuned using additional datasets, including over one million human-annotated examples. Training was conducted using Meta's Research Super Cluster and production clusters, with extensive use of cloud-based computation resources.
Guide: Running Locally
- Environment Setup: Ensure you have Python and PyTorch installed. Clone the Llama 2 repository from Hugging Face.
- Model Download: Access the model weights and tokenizer by accepting the Meta license here.
- Installation: Install necessary dependencies using
pip install -r requirements.txt
. - Execution: Load the model using Hugging Face Transformers and run inference on your input text.
Cloud GPUs: For optimal performance, it is recommended to use cloud-based GPU services such as AWS EC2, Google Cloud, or Azure with NVIDIA A100 GPUs.
License
Llama 2 is released under a custom license that requires users to accept terms before accessing or distributing the models. The license grants a non-exclusive, worldwide, royalty-free limited right to use, reproduce, and distribute the models, with restrictions on commercial use for entities with over 700 million monthly active users. For more details, refer to the Llama 2 Community License Agreement.