Meta Llama 3 8 B Instruct

meta-llama

Introduction

The Meta-Llama 3-8B-Instruct is part of the Llama 3 family of large language models developed by Meta. Designed for text generation and conversational applications, it is optimized for dialogue use cases and outperforms various open-source chat models on industry benchmarks. The model comes in two sizes (8B and 70B parameters) and employs advanced techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) for alignment with human preferences.

Architecture

Llama 3 is an auto-regressive language model utilizing an optimized transformer architecture. It focuses on generating text and code from input text. The models are available in pre-trained and instruction-tuned variants to accommodate diverse applications, with the instruction-tuned models being particularly optimized for conversational tasks.

Training

The Llama 3 models were pretrained using over 15 trillion tokens from publicly available data sources. The training infrastructure included Meta's Research SuperCluster and third-party cloud compute resources. The 8B and 70B models used Grouped-Query Attention (GQA) for enhanced inference scalability. The models underwent extensive fine-tuning with over 10 million human-annotated examples.

Guide: Running Locally

To run Meta-Llama-3-8B-Instruct locally, you can use the following methods:

Using Transformers

  1. Install Dependencies: Ensure you have transformers and torch installed.
  2. Load Model: Use the Transformers library to load the model.
  3. Generate Text: Utilize the pipeline or AutoModelForCausalLM classes to interact with the model.
import transformers
pipeline = transformers.pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto")
output = pipeline("Hello, how are you?", max_new_tokens=50)
print(output)
  1. Inference Parameters: Adjust parameters like max_new_tokens, temperature, and top_p for different outputs.

Using Llama3 Codebase

  1. Download Model: Utilize huggingface-cli to download the original model checkpoints.
  2. Run Inference: Follow the instructions in the Llama3 GitHub repository for detailed steps.

Cloud GPUs

For optimal performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.

License

The Meta-Llama 3 models are available under a custom community license. The license grants a non-exclusive, worldwide, non-transferable, and royalty-free limited license for usage, reproduction, and modification of the models. Certain conditions apply, such as providing attribution and compliance with the Acceptable Use Policy. For organizations with over 700 million monthly active users, additional commercial terms may apply. The full license details can be found on the official Meta Llama 3 License Page.

More Related APIs in Text Generation