Meta Llama 3 8 B Instruct LLM Model

Introduction

Meta Llama 3 is a series of large language models (LLMs) developed by Meta, designed for text generation and optimized for dialogue use cases. The models come in two sizes, 8B and 70B parameters, and are available in pre-trained and instruction-tuned variants. These models are optimized for helpfulness and safety, outperforming many existing open-source chat models.

Architecture

Llama 3 is built as an auto-regressive language model utilizing an optimized transformer architecture. Supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) are used to align the models with human preferences, enhancing them for dialogue-based applications. The models employ Grouped-Query Attention (GQA) to improve inference scalability.

Training

The Llama 3 models were pretrained on a diverse set of publicly available data, comprising over 15 trillion tokens. Fine-tuning involved more than 10 million human-annotated examples, without involving any Meta user data. The 8B and 70B models have a knowledge cutoff of March and December 2023, respectively. Pretraining involved 7.7 million GPU hours, with efforts made to offset the carbon footprint involved.

Guide: Running Locally

To run the Meta-Llama-3-8B-Instruct model locally, you can use the Transformers library. Below are the basic steps to get started:

Install Dependencies: Ensure you have the transformers and torch libraries installed.
```
pip install transformers torch
```

Load the Model and Tokenizer:

import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Generate Text:

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Cloud GPUs: For enhanced performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure.

License

The Meta Llama 3 models are released under the Meta Llama 3 Community License Agreement. This license grants a non-exclusive, worldwide, non-transferable license to use, reproduce, distribute, and modify the Llama Materials. Redistribution of the models requires including the license agreement and an attribution notice. Additional commercial terms apply for users exceeding specific thresholds of monthly active users. The license also includes disclaimers of warranty and limitations of liability. For full terms, visit the license documentation.

More Related APIs in Text Generation