Llama 3.1 405 B

meta-llama

Llama 3.1 Model Documentation

Introduction

Llama 3.1 is Meta's collection of multilingual large language models, optimized for dialogue in eight languages. It includes models with 8B, 70B, and 405B parameters, designed for both commercial and research applications. The models are built using an auto-regressive transformer architecture, enhanced with supervised fine-tuning and reinforcement learning with human feedback.

Architecture

Llama 3.1 models utilize an optimized transformer architecture with Grouped-Query Attention (GQA) for enhanced scalability. They support multilingual text input and output, and feature extended context windows for better performance.

Training

The models were pretrained on approximately 15 trillion tokens from publicly available sources, with additional fine-tuning using a mix of human and synthetically generated data. Training involved 39.3 million GPU hours on Meta’s custom infrastructure, with a focus on minimizing greenhouse gas emissions.

Guide: Running Locally

  1. Setup Environment: Ensure you have Python and PyTorch installed.
  2. Install Dependencies: Use the transformers library from Hugging Face.
  3. Download Model: Obtain the model files from Hugging Face's repository.
  4. Load Model in Code: Use the transformers library to load and generate text with the model.
  5. Cloud GPUs: For large models like the 405B version, consider using cloud GPU services such as AWS, Google Cloud, or Azure for efficient execution.

License

The Llama 3.1 models are released under the Llama 3.1 Community License, which permits non-exclusive, worldwide, non-transferable, and royalty-free usage. Redistribution requires providing a copy of the license and appropriate attribution. Use of the models must comply with the Acceptable Use Policy and applicable laws. Full license details are available in the provided documentation link.

More Related APIs in Text Generation