Meta Llama 3 8 B Instruct
NousResearchIntroduction
Meta Llama 3 is a series of large language models (LLMs) developed by Meta, designed for text generation and optimized for dialogue use cases. The models come in two sizes, 8B and 70B parameters, and are available in pre-trained and instruction-tuned variants. These models are optimized for helpfulness and safety, outperforming many existing open-source chat models.
Architecture
Llama 3 is built as an auto-regressive language model utilizing an optimized transformer architecture. Supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) are used to align the models with human preferences, enhancing them for dialogue-based applications. The models employ Grouped-Query Attention (GQA) to improve inference scalability.
Training
The Llama 3 models were pretrained on a diverse set of publicly available data, comprising over 15 trillion tokens. Fine-tuning involved more than 10 million human-annotated examples, without involving any Meta user data. The 8B and 70B models have a knowledge cutoff of March and December 2023, respectively. Pretraining involved 7.7 million GPU hours, with efforts made to offset the carbon footprint involved.
Guide: Running Locally
To run the Meta-Llama-3-8B-Instruct model locally, you can use the Transformers library. Below are the basic steps to get started:
-
Install Dependencies: Ensure you have the
transformers
andtorch
libraries installed.pip install transformers torch
-
Load the Model and Tokenizer:
import transformers from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "meta-llama/Meta-Llama-3-8B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", )
-
Generate Text:
messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate( input_ids, max_new_tokens=256, do_sample=True, temperature=0.6, top_p=0.9, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True))
-
Cloud GPUs: For enhanced performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure.
License
The Meta Llama 3 models are released under the Meta Llama 3 Community License Agreement. This license grants a non-exclusive, worldwide, non-transferable license to use, reproduce, distribute, and modify the Llama Materials. Redistribution of the models requires including the license agreement and an attribution notice. Additional commercial terms apply for users exceeding specific thresholds of monthly active users. The license also includes disclaimers of warranty and limitations of liability. For full terms, visit the license documentation.