Qw Q 14 B Math v0.2 G G U F

QuantFactory

Introduction

QWQ-14B-MATH-V0.2-GGUF is a quantized version of the QwQ-14B-Math-v0.2 model, developed by qingy2024. This model is fine-tuned from the base model unsloth/qwen2.5-14b-bnb-4bit and is aimed at generating responses for mathematical queries using a specific dataset.

Architecture

The model architecture is based on the Qwen 2.5-14B, which is enhanced with quantization for more efficient inference. It utilizes the ChatML template for structured conversation modeling.

Training

The QWQ-14B-MATH-V0.2-GGUF is trained using the following configuration:

  • Base Model: Qwen 2.5-14B
  • Fine-Tuning Dataset: A verified subset of NuminaMathCoT, leveraging Qwen 2.5 3B Instruct as a judge.
  • QLoRA Configuration:
    • Rank: 32
    • Rank Stabilization: Enabled
  • Optimization Settings:
    • Batch Size: 8
    • Gradient Accumulation Steps: 2 (Effective Batch Size: 16)
    • Warm-Up Steps: 5
    • Weight Decay: 0.01
  • Training Steps: 500, halted upon loss plateau to avoid overfitting.
  • Hardware Used: A100-80GB GPU.

Guide: Running Locally

To run the QWQ-14B-MATH-V0.2-GGUF model locally, follow these steps:

  1. Setup Environment: Ensure you have Python and necessary libraries installed, such as transformers and torch.
  2. Download Model: Acquire the model files from the Hugging Face repository.
  3. Load Model: Use the transformers library to load the model for inference.
  4. Run Inference: Use the model to generate responses for mathematical questions.

Cloud GPUs: For optimal performance, especially with large models, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs, Google Cloud Platform, or Azure.

License

The QWQ-14B-MATH-V0.2-GGUF model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with attribution.

More Related APIs