Llama 2 7 B Chat G G M L

TheBloke

LLAMA 2 7B CHAT - GGML

Introduction

Llama 2 7B Chat is a fine-tuned generative text model by Meta, optimized for dialogue use cases. It is part of the Llama 2 family, which includes models ranging from 7 billion to 70 billion parameters. The model is designed for commercial and research use, particularly for assistant-like chat applications.

Architecture

Llama 2 is an auto-regressive language model utilizing an optimized transformer architecture. It employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model inputs text and generates text outputs, with variations available in different parameter sizes (7B, 13B, 70B).

Training

Llama 2 was pretrained on 2 trillion tokens from publicly available sources, with fine-tuning using over one million new human-annotated examples. The pretraining utilized Meta's infrastructure, comprising 3.3 million GPU hours on A100-80GB hardware, with all emissions offset by Meta’s sustainability program.

Guide: Running Locally

  1. Requirements: Ensure you have access to a compatible system with necessary libraries such as llama.cpp for running the model.
  2. Model Files: Obtain the quantized GGML or recommended GGUF format model files from the appropriate repository.
  3. Setup: Use the command-line interface of llama.cpp or text-generation-webui to load and run the model. Adjust parameters like CPU threads with -t and GPU layers with -ngl based on your hardware setup.
  4. Command Example:
    ./main -t 10 -ngl 32 -m llama-2-7b-chat.ggmlv3.q4_K_M.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>\nYou are a helpful assistant...\n<</SYS>>\nWrite a story about llamas[/INST]"
    
  5. Cloud GPUs: For enhanced performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The use of Llama 2 models is governed by a custom commercial license from Meta. To download model weights and tokenizer, users must accept the license agreement available on Meta's website.

More Related APIs in Text Generation