Llama 2 70b chat hf

meta-llama

LLAMA-2-70B-CHAT-HF

Introduction

Llama 2 is a collection of pretrained and fine-tuned generative text models developed by Meta, ranging from 7 billion to 70 billion parameters. The Llama-2-70B-Chat model is optimized for dialogue use cases and is available in the Hugging Face Transformers format. It is designed for commercial and research use in English, offering assistant-like chat capabilities.

Architecture

Llama 2 is an auto-regressive language model with an optimized transformer architecture. It is trained using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model includes variations with 7B, 13B, and 70B parameters and is fine-tuned for specific tasks.

Training

The Llama 2 models were pretrained on 2 trillion tokens from publicly available sources, with fine-tuning involving over one million new human-annotated examples. The training utilized Meta's Research Super Cluster and third-party cloud compute, with a cumulative 3.3 million GPU hours on A100-80GB hardware. The estimated carbon emissions were offset by Meta's sustainability program.

Guide: Running Locally

  1. Access and License: Visit Meta's website to accept the license agreement. This is required to download the model weights and tokenizer.

  2. Environment Setup: Ensure you have Python and the necessary libraries installed. You can use pip to install transformers and torch.

    pip install transformers torch
    
  3. Download the Model: Use the Hugging Face Transformers library to download and load the model.

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-70b-chat-hf")
    model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-70b-chat-hf")
    
  4. Inference: Input and generate text using the model.

    inputs = tokenizer("Hello, how can I assist you today?", return_tensors="pt")
    outputs = model.generate(**inputs)
    print(tokenizer.decode(outputs[0]))
    
  5. Cloud GPU Suggestion: For optimal performance, especially with larger models like Llama-2-70B, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The use of Llama 2 is governed by the LLAMA 2 Community License Agreement, which grants a non-exclusive, worldwide, non-transferable, and royalty-free limited license. Redistribution must include the license agreement, and usage must comply with the Acceptable Use Policy and applicable laws. The license prohibits using Llama 2 outputs to improve other large language models.

More Related APIs in Text Generation