Small Thinker 3 B Preview

PowerInfer

Introduction

SmallThinker-3B-preview is a fine-tuned version of the Qwen2.5-3B-Instruct model, optimized for specific use cases such as edge deployment and serving as a draft model for larger applications. It offers improved performance in various benchmarks while maintaining a compact size suitable for resource-constrained environments.

Architecture

SmallThinker-3B-preview builds on the architecture of Qwen2.5-3B-Instruct. It is optimized for text generation tasks and is compatible with the Hugging Face Transformers library. The model supports English language processing and is designed for efficient deployment.

Training

The model was trained using 8 H100 GPUs with a global batch size of 16 over two phases of Supervised Fine-Tuning (SFT). Key training configurations include a learning rate of 1.0e-5, a cosine learning rate scheduler, and bf16 precision. The training process involved the following datasets:

  • Phase 1: PowerInfer/QWQ-LONGCOT-500K for 1.5 epochs
  • Phase 2: Combined datasets from PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine for an additional 2 epochs

Model Stats Number

  • Benchmark Performance:
    • AIME24: 16.667
    • AMC23: 57.5
    • GAOKAO2024_I: 64.2
    • GAOKAO2024_II: 57.1
    • MMLU_STEM: 68.2
    • AMPS_Hard: 70
    • Math_comp: 46.8

Guide: Running Locally

  1. Install Dependencies: Ensure Python and the Hugging Face Transformers library are installed.
  2. Download the Model: Use Hugging Face's model hub to download SmallThinker-3B-preview.
  3. Set Up Environment: Configure your Python environment with necessary libraries and dependencies.
  4. Run Inference: Load the model and perform inference using your input data.

For optimal performance, consider using cloud GPUs such as AWS EC2 instances with Nvidia GPUs.

License

SmallThinker-3B-preview is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs