Yu Lan Mini G G U F
bartowskiIntroduction
The YuLan-Mini-GGUF is a text generation model designed to work with quantized versions using llama.cpp. It supports both English and Chinese languages and is trained on diverse datasets.
Architecture
YuLan-Mini-GGUF utilizes the LLAMACPP IMATRIX quantization approach. The original model is based on the YuLan-Mini from the yulan-team, and several quantization types are available, ranging from higher precision formats (F32, F16) to more compressed formats (Q8_0, Q6_K).
Training
The model is trained using a variety of datasets, including those focused on educational, mathematical, and programming content. Notable datasets include HuggingFaceFW/fineweb-edu, bigcode/the-stack-v2, and AI-MO/NuminaMath-CoT, among others.
Guide: Running Locally
- Install Prerequisites: Ensure you have the latest version of
huggingface_hub
CLI.pip install -U "huggingface_hub[cli]"
- Download Model Files: Use the CLI to download the specific quantization type suited to your hardware.
huggingface-cli download bartowski/YuLan-Mini-GGUF --include "YuLan-Mini-Q4_K_M.gguf" --local-dir ./
- Select the Appropriate Quant File: Choose the quantization file based on your GPU/CPU capabilities. For instance, you may opt for Q5_K_M for high quality or Q3_K_S for lower RAM availability.
- Cloud GPU Recommendation: For optimal performance, consider using cloud GPUs with sufficient VRAM to accommodate the model size.
License
The YuLan-Mini-GGUF is distributed under the MIT license, allowing for extensive usage and modification.