Llama 3.2 1 B Instruct gptqmodel 4bit vortex v2.5

ModelCloud

Introduction

The Llama-3.2-1B-Instruct-GPTQModel-4bit-Vortex-v2.5 is a text generation model developed by ModelCloud. It is designed for efficient processing with 4-bit quantization, suitable for conversational AI and various language tasks.

Architecture

This model uses a quantized version of the Llama-3.2 architecture, which reduces the precision of model weights to 4 bits using the GPTQ (Quantized) method. Key features include:

  • Quantization Method: GPTQ
  • Precision: 4-bit
  • Group Size: 32
  • Symmetric Quantization: Enabled
  • True Sequential Operation: Enabled

The model supports multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Training

The model is based on the meta-llama/Llama-3.2-1B-Instruct and has been fine-tuned using GPTQModel version 1.1.0. It utilizes a specific checkpoint format and quantization settings optimized for text generation tasks.

Guide: Running Locally

To run this model locally, follow these steps:

  1. Install Packages: Ensure you have transformers and gptqmodel libraries installed.
  2. Load Tokenizer and Model:
    from transformers import AutoTokenizer
    from gptqmodel import GPTQModel
    
    model_name = "ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v2.5"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = GPTQModel.from_quantized(model_name)
    
  3. Prepare Input and Generate Output:
    messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
    outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=512)
    result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
    print(result)
    
  4. Cloud GPUs: For better performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.

License

The model is distributed under the llama3.2 license. Please refer to the official license document for terms of use and distribution.

More Related APIs in Text Generation