Llama 3.1 70 B Instruct

meta-llama

Introduction

The Meta Llama 3.1 is a collection of multilingual large language models (LLMs) developed by Meta. These models are optimized for multilingual dialogue use cases and outperform many available open-source and closed chat models on common industry benchmarks. They are designed for commercial and research use in various languages, with models available in 8B, 70B, and 405B sizes.

Architecture

Llama 3.1 is an auto-regressive language model utilizing an optimized transformer architecture. The models are instruction-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. They support eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Training

The Llama 3.1 models were pretrained on approximately 15 trillion tokens from publicly available sources, with fine-tuning using publicly available instruction datasets and over 25 million synthetically generated examples. Training employed Meta's custom-built GPU cluster using H100-80GB hardware, resulting in a cumulative 39.3 million GPU hours. The models were trained with net-zero greenhouse gas emissions due to Meta's renewable energy practices.

Guide: Running Locally

To use the Llama 3.1 model locally, follow these basic steps:

  1. Install Dependencies: Ensure your environment has transformers version 4.43.0 or higher and torch installed.

    pip install --upgrade transformers
    
  2. Load the Model: Use the transformers library to load the model.

    import transformers
    
    model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct"
    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )
    
  3. Run Inference: Use the pipeline for text generation tasks.

    outputs = pipeline(
        [{"role": "user", "content": "What is the weather like today?"}],
        max_new_tokens=256,
    )
    print(outputs[0]["generated_text"][-1])
    
  4. Consider Cloud GPUs: For optimal performance, especially with larger models, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The Llama 3.1 models are released under a custom commercial license, the Llama 3.1 Community License. This allows for usage, reproduction, distribution, and modification under specific terms. Users must comply with the Acceptable Use Policy and acknowledge Meta's intellectual property rights. The full license details are available here.

More Related APIs in Text Generation