llama 3.3 70b instruct awq

casperhansen

Introduction

The Llama 3.3 70B Instruct model is a multilingual large language model (LLM) from Meta, designed for text generation with optimized performance for multilingual dialogue. It features an enhanced transformer architecture and is fine-tuned using supervised techniques and reinforcement learning with human feedback to improve helpfulness and safety.

Architecture

Llama 3.3 employs an auto-regressive language model architecture with a transformer-based design. It uses Grouped-Query Attention (GQA) for improved inference scalability. The model supports 8 languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The Llama 3.3 model is built upon the meta-llama/Llama-3.1-70B base model, with a focus on multilingual text and code generation.

Training

The model was trained using a diverse mix of publicly available online data, with 70 billion parameters and a token count exceeding 15 trillion. The training process includes supervised fine-tuning and reinforcement learning with human feedback. The model is static, trained on an offline dataset, with plans for future improvements based on community feedback.

Guide: Running Locally

  1. Setup:

    • Clone the repository: git clone https://github.com/casper-hansen/AutoAWQ
    • Navigate to the directory: cd AutoAWQ
    • Install necessary dependencies: pip install -r requirements.txt
  2. Running the Model:

    • Load the model using the Transformers library in Python.
    • Initiate the text generation process with desired input prompts.
  3. Hardware Requirements:

    • Due to the large size of Llama 3.3, it is recommended to use cloud GPUs for efficient computation. Services like AWS, Google Cloud, or Azure offer suitable GPU instances.

License

The model is released under the Llama 3.3 Community License Agreement, a custom commercial license. The full license details can be found at Llama 3.3 License.

More Related APIs in Text Generation