Deep Seek Coder V2 Lite Instruct G G U F
bartowskiIntroduction
DeepSeek-Coder-V2-Lite-Instruct-GGUF is a text generation model optimized using llama.cpp quantizations, aiming to enhance performance while reducing model size. It is designed for versatile use with various quantization configurations.
Architecture
The model is based on the DeepSeek-Coder-V2-Lite-Instruct architecture. Quantizations were performed using llama.cpp, specifically the release b3166. The model offers multiple quantization formats ranging from Q8_0 to IQ2_XS, each providing different balances of quality, size, and performance.
Training
Quantizations utilize the imatrix option with the dataset available here. These quantizations aim to optimize the model's performance on various hardware configurations by adjusting the precision of weights.
Guide: Running Locally
-
Installation: Ensure you have
huggingface-cli
installed:pip install -U "huggingface_hub[cli]"
-
Download a Model File: Use the following command to download a specific quantized file:
huggingface-cli download bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF --include "DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf" --local-dir ./
For models larger than 50GB, they will be split into multiple files:
huggingface-cli download bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF --include "DeepSeek-Coder-V2-Lite-Instruct-Q8_0.gguf/*" --local-dir DeepSeek-Coder-V2-Lite-Instruct-Q8_0
-
Choose the Right File: Determine the quantization file based on your hardware's RAM and VRAM. For maximum speed, choose a file size slightly smaller than your GPU's VRAM. For maximum quality, combine system RAM and VRAM, and select a file slightly smaller than this total.
-
Quantization Types:
- K-quants: Recommended for general use.
- I-quants: Better for performance with specific configurations (e.g., sub-Q4 models using cuBLAS or rocBLAS).
For cloud GPU options, consider platforms like AWS, Google Cloud, or Azure with appropriate GPU instances.
License
The model is released under the deepseek-license. For more details, refer to the LICENSE file associated with the model.