Small Thinker 3 B Preview G G U F
bartowskiIntroduction
SmallThinker-3B-Preview-GGUF is a quantized version of the SmallThinker-3B model, designed for efficient text generation tasks. The model uses llama.cpp for quantization and supports various quantization formats to optimize for different hardware and performance requirements.
Architecture
The model is built on the PowerInfer/SmallThinker-3B architecture and is compatible with various backends, thanks to the quantization options provided by the llama.cpp library. This library facilitates the conversion of model weights to different quantization formats, optimizing the model for specific hardware, such as ARM or AVX CPUs.
Training
The model was quantized using the imatrix
option, utilizing the PowerInfer/QWQ-LONGCOT-500K dataset. This quantization approach aims to maintain a balance between model size and performance, allowing for efficient deployment on various devices.
Guide: Running Locally
- Install Dependencies: Ensure you have
huggingface_hub
installed for downloading the model files.pip install -U "huggingface_hub[cli]"
- Download the Model: Use the
huggingface-cli
to download the desired quantized model file. For example:huggingface-cli download bartowski/SmallThinker-3B-Preview-GGUF --include "SmallThinker-3B-Preview-Q4_K_M.gguf" --local-dir ./
- Choose the Right File: Select a quant file based on your hardware's RAM/VRAM availability. Aim for a model file size that is 1-2GB smaller than your available memory.
- Run the Model: Use the appropriate inference engine, such as LM Studio, to load and run the model.
Cloud GPUs
To maximize performance, consider using cloud-based GPUs that offer sufficient VRAM to accommodate the model's requirements. Services like AWS, Google Cloud, or Azure provide access to high-performance GPUs.
License
The model and its quantized versions are distributed under the licensing terms specified on the Hugging Face model page. Users should review these terms to ensure compliance with any usage restrictions or requirements.