Llama 3.2 1 B Instruct gptqmodel 4bit vortex v2.5
ModelCloudIntroduction
The Llama-3.2-1B-Instruct-GPTQModel-4bit-Vortex-v2.5
is a text generation model developed by ModelCloud. It is designed for efficient processing with 4-bit quantization, suitable for conversational AI and various language tasks.
Architecture
This model uses a quantized version of the Llama-3.2 architecture, which reduces the precision of model weights to 4 bits using the GPTQ (Quantized) method. Key features include:
- Quantization Method: GPTQ
- Precision: 4-bit
- Group Size: 32
- Symmetric Quantization: Enabled
- True Sequential Operation: Enabled
The model supports multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Training
The model is based on the meta-llama/Llama-3.2-1B-Instruct
and has been fine-tuned using GPTQModel version 1.1.0. It utilizes a specific checkpoint format and quantization settings optimized for text generation tasks.
Guide: Running Locally
To run this model locally, follow these steps:
- Install Packages: Ensure you have
transformers
andgptqmodel
libraries installed. - Load Tokenizer and Model:
from transformers import AutoTokenizer from gptqmodel import GPTQModel model_name = "ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v2.5" tokenizer = AutoTokenizer.from_pretrained(model_name) model = GPTQModel.from_quantized(model_name)
- Prepare Input and Generate Output:
messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt") outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=512) result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True) print(result)
- Cloud GPUs: For better performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.
License
The model is distributed under the llama3.2
license. Please refer to the official license document for terms of use and distribution.