Llama 3.2 3 B Instruct 4bit

mlx-community

Introduction

The MLX-COMMUNITY's Llama-3.2-3B-Instruct-4bit is a text generation model based on Meta's Llama 3.2 architecture. It supports multiple languages and is compatible with PyTorch and the Transformers library. This model is designed for efficient inference using 4-bit precision.

Architecture

Llama-3.2-3B-Instruct-4bit is a compact model variant of the Llama series, focused on conversational and text generation tasks. It utilizes a 4-bit precision format to optimize performance and resource efficiency. The model is built on the foundational Llama 3.2 architecture and has been adapted for use with MLX, a machine learning framework.

Training

The model was converted from the original meta-llama/Llama-3.2-3B-Instruct using MLX-LM version 0.18.2. While specific training details are not provided, the model leverages the Llama architecture's capabilities for generating and understanding text across multiple languages.

Guide: Running Locally

  1. Install MLX-LM: Ensure that you have the MLX-LM library installed by running:

    pip install mlx-lm
    
  2. Load the Model: Use the following Python code to load the model and tokenizer:

    from mlx_lm import load, generate
    
    model, tokenizer = load("mlx-community/Llama-3.2-3B-Instruct-4bit")
    
  3. Generate Text: Prepare a prompt and generate a response:

    prompt = "hello"
    if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
        messages = [{"role": "user", "content": prompt}]
        prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    
    response = generate(model, tokenizer, prompt=prompt, verbose=True)
    

Cloud GPUs: For optimal performance, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure to handle the computational load.

License

The Llama-3.2-3B-Instruct-4bit model is provided under the Llama 3.2 Community License Agreement. This license grants a non-exclusive, worldwide, non-transferable, and royalty-free limited license to use, reproduce, distribute, and modify the Llama Materials. Redistribution must include this license agreement and proper attribution. The model comes with no warranty, and liability is limited as per the terms of the agreement. Use of the model must comply with applicable laws and regulations, including Meta's Acceptable Use Policy.

More Related APIs in Text Generation