Qw Q 32 B Preview abliterated G G U F

bartowski

Introduction

QwQ-32B-Preview-abliterated-GGUF is a text generation model that supports various quantization levels for optimized performance on different hardware configurations. It is designed to be uncensored and conversational in nature, utilizing the llama.cpp framework for quantization.

Architecture

The model is based on the original QwQ-32B architecture, which has been quantized using the llama.cpp release b4222 with the imatrix option. This allows for various quantization formats, including BF16, Q8_0, and others, offering different trade-offs between quality and performance.

Training

The quantizations are created using an imatrix calibration dataset, providing multiple file options tailored for specific performance and quality requirements. These quantizations allow users to select the best fit based on their hardware capabilities and application needs.

Guide: Running Locally

  1. Installation: Ensure huggingface-cli is installed.

    pip install -U "huggingface_hub[cli]"
    
  2. Download Model: Use huggingface-cli to download the desired quantized model file.

    huggingface-cli download bartowski/QwQ-32B-Preview-abliterated-GGUF --include "QwQ-32B-Preview-abliterated-Q4_K_M.gguf" --local-dir ./
    
  3. Execution: Use LM Studio or compatible frameworks to run the model locally.

  4. Hardware Recommendation: For optimal performance, using cloud GPUs such as those from AWS or Google Cloud is recommended, especially for larger model files.

License

The model and its quantizations are released under the Apache 2.0 license. For more detailed licensing information, refer to the license document.

More Related APIs in Text Generation