llama 3.3 70b instruct awq
casperhansenIntroduction
The Llama 3.3 70B Instruct
model is a multilingual large language model (LLM) from Meta, designed for text generation with optimized performance for multilingual dialogue. It features an enhanced transformer architecture and is fine-tuned using supervised techniques and reinforcement learning with human feedback to improve helpfulness and safety.
Architecture
Llama 3.3 employs an auto-regressive language model architecture with a transformer-based design. It uses Grouped-Query Attention (GQA) for improved inference scalability. The model supports 8 languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The Llama 3.3 model is built upon the meta-llama/Llama-3.1-70B
base model, with a focus on multilingual text and code generation.
Training
The model was trained using a diverse mix of publicly available online data, with 70 billion parameters and a token count exceeding 15 trillion. The training process includes supervised fine-tuning and reinforcement learning with human feedback. The model is static, trained on an offline dataset, with plans for future improvements based on community feedback.
Guide: Running Locally
-
Setup:
- Clone the repository:
git clone https://github.com/casper-hansen/AutoAWQ
- Navigate to the directory:
cd AutoAWQ
- Install necessary dependencies:
pip install -r requirements.txt
- Clone the repository:
-
Running the Model:
- Load the model using the Transformers library in Python.
- Initiate the text generation process with desired input prompts.
-
Hardware Requirements:
- Due to the large size of Llama 3.3, it is recommended to use cloud GPUs for efficient computation. Services like AWS, Google Cloud, or Azure offer suitable GPU instances.
License
The model is released under the Llama 3.3 Community License Agreement, a custom commercial license. The full license details can be found at Llama 3.3 License.