Qw Q 14 B Math v0.2 G G U F
QuantFactoryIntroduction
QWQ-14B-MATH-V0.2-GGUF is a quantized version of the QwQ-14B-Math-v0.2 model, developed by qingy2024. This model is fine-tuned from the base model unsloth/qwen2.5-14b-bnb-4bit and is aimed at generating responses for mathematical queries using a specific dataset.
Architecture
The model architecture is based on the Qwen 2.5-14B, which is enhanced with quantization for more efficient inference. It utilizes the ChatML template for structured conversation modeling.
Training
The QWQ-14B-MATH-V0.2-GGUF is trained using the following configuration:
- Base Model: Qwen 2.5-14B
- Fine-Tuning Dataset: A verified subset of NuminaMathCoT, leveraging Qwen 2.5 3B Instruct as a judge.
- QLoRA Configuration:
- Rank: 32
- Rank Stabilization: Enabled
- Optimization Settings:
- Batch Size: 8
- Gradient Accumulation Steps: 2 (Effective Batch Size: 16)
- Warm-Up Steps: 5
- Weight Decay: 0.01
- Training Steps: 500, halted upon loss plateau to avoid overfitting.
- Hardware Used: A100-80GB GPU.
Guide: Running Locally
To run the QWQ-14B-MATH-V0.2-GGUF model locally, follow these steps:
- Setup Environment: Ensure you have Python and necessary libraries installed, such as
transformers
andtorch
. - Download Model: Acquire the model files from the Hugging Face repository.
- Load Model: Use the
transformers
library to load the model for inference. - Run Inference: Use the model to generate responses for mathematical questions.
Cloud GPUs: For optimal performance, especially with large models, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs, Google Cloud Platform, or Azure.
License
The QWQ-14B-MATH-V0.2-GGUF model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with attribution.