Qw Q 32 B Preview abliterated G G U F
bartowskiIntroduction
QwQ-32B-Preview-abliterated-GGUF is a text generation model that supports various quantization levels for optimized performance on different hardware configurations. It is designed to be uncensored and conversational in nature, utilizing the llama.cpp framework for quantization.
Architecture
The model is based on the original QwQ-32B architecture, which has been quantized using the llama.cpp release b4222 with the imatrix option. This allows for various quantization formats, including BF16, Q8_0, and others, offering different trade-offs between quality and performance.
Training
The quantizations are created using an imatrix calibration dataset, providing multiple file options tailored for specific performance and quality requirements. These quantizations allow users to select the best fit based on their hardware capabilities and application needs.
Guide: Running Locally
-
Installation: Ensure
huggingface-cli
is installed.pip install -U "huggingface_hub[cli]"
-
Download Model: Use
huggingface-cli
to download the desired quantized model file.huggingface-cli download bartowski/QwQ-32B-Preview-abliterated-GGUF --include "QwQ-32B-Preview-abliterated-Q4_K_M.gguf" --local-dir ./
-
Execution: Use LM Studio or compatible frameworks to run the model locally.
-
Hardware Recommendation: For optimal performance, using cloud GPUs such as those from AWS or Google Cloud is recommended, especially for larger model files.
License
The model and its quantizations are released under the Apache 2.0 license. For more detailed licensing information, refer to the license document.