Qw Q 32 B Preview gptqmodel 4bit vortex v2
ModelCloudIntroduction
The QwQ-32B-Preview-GPTQMODEL-4BIT-VORTEX-V2 is a quantized language model for text generation, designed to operate with high efficiency and low precision, specifically 4-bit. It is part of the ModelCloud suite and optimized for chat and instructive tasks.
Architecture
This model utilizes the GPTQ quantization method to reduce model size and increase computational efficiency. Key architectural features include:
- 4-bit precision
- Group size of 32
- True sequential processing
- Symmetric quantization
Training
The model has been quantized with the GPTQModel library, version 1.4.4, using a dampening percentage of 0.1 and an auto-increment of 0.0015. The quantization process involves techniques like symmetric quantization and static group adjustment to ensure optimal performance.
Guide: Running Locally
-
Install Dependencies: Ensure you have Python installed. Use pip to install
transformers
andgptqmodel
.pip install transformers pip install gptqmodel
-
Load the Model and Tokenizer:
from transformers import AutoTokenizer from gptqmodel import GPTQModel tokenizer = AutoTokenizer.from_pretrained("ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v2") model = GPTQModel.load("ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v2")
-
Create Input Messages:
messages = [ {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."}, {"role": "user", "content": "How can I design a data structure in C++ to store the top 5 largest integer numbers?"}, ]
-
Generate Responses:
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt") outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=512) result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True) print(result)
-
Consider Using Cloud GPUs: For better performance, especially with large models, consider using cloud-based GPUs such as AWS, Google Cloud, or Azure.
License
This model is released under the Apache 2.0 license. For more details, refer to the license file here.