Llama 2 7 B Chat G G M L
TheBlokeLLAMA 2 7B CHAT - GGML
Introduction
Llama 2 7B Chat is a fine-tuned generative text model by Meta, optimized for dialogue use cases. It is part of the Llama 2 family, which includes models ranging from 7 billion to 70 billion parameters. The model is designed for commercial and research use, particularly for assistant-like chat applications.
Architecture
Llama 2 is an auto-regressive language model utilizing an optimized transformer architecture. It employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model inputs text and generates text outputs, with variations available in different parameter sizes (7B, 13B, 70B).
Training
Llama 2 was pretrained on 2 trillion tokens from publicly available sources, with fine-tuning using over one million new human-annotated examples. The pretraining utilized Meta's infrastructure, comprising 3.3 million GPU hours on A100-80GB hardware, with all emissions offset by Meta’s sustainability program.
Guide: Running Locally
- Requirements: Ensure you have access to a compatible system with necessary libraries such as
llama.cpp
for running the model. - Model Files: Obtain the quantized GGML or recommended GGUF format model files from the appropriate repository.
- Setup: Use the command-line interface of
llama.cpp
ortext-generation-webui
to load and run the model. Adjust parameters like CPU threads with-t
and GPU layers with-ngl
based on your hardware setup. - Command Example:
./main -t 10 -ngl 32 -m llama-2-7b-chat.ggmlv3.q4_K_M.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>\nYou are a helpful assistant...\n<</SYS>>\nWrite a story about llamas[/INST]"
- Cloud GPUs: For enhanced performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The use of Llama 2 models is governed by a custom commercial license from Meta. To download model weights and tokenizer, users must accept the license agreement available on Meta's website.