Qwen2.5 Coder 32 B Instruct abliterated G G U F
bartowskiIntroduction
The Qwen2.5-Coder-32B-Instruct-Abliterated-GGUF model is a quantized version of the original Qwen2.5-Coder-32B-Instruct developed using llama.cpp. It offers various quantization levels to balance quality and performance for text generation tasks in English, focusing on code and conversational functionalities.
Architecture
The model is based on the Qwen2.5-Coder-32B-Instruct architecture and has been quantized using the llama.cpp framework. Different quantization levels, such as Q8_0, Q6_K_L, and IQ2_S, are available, allowing users to select based on their specific hardware capabilities and quality requirements.
Training
Quantization was performed using the imatrix option with a dataset curated for this purpose. The quantization process aimed to provide high-quality outputs while reducing the model's size and computational requirements.
Guide: Running Locally
- Installation: Ensure you have the
huggingface-cli
tool installed by runningpip install -U "huggingface_hub[cli]"
. - Downloading Files: Use the command
huggingface-cli download bartowski/Qwen2.5-Coder-32B-Instruct-abliterated-GGUF --include "<filename>" --local-dir ./
to download specific quantized model files. - Choosing Quantization: Determine the appropriate quant file based on your hardware's RAM and VRAM capacities. Use a file size that is 1-2GB smaller than your available VRAM for optimal performance.
- Execution: Run the model using your preferred environment, ensuring compatibility with your hardware (e.g., ARM, NVIDIA).
For cloud deployment, consider using cloud GPUs like NVIDIA or AMD for faster model execution.
License
The model is licensed under the Apache 2.0 License, allowing for broad use and distribution with compliance to the license terms.