Q2.5 Veltha 14 B 0.5 G G U F
bartowskiIntroduction
The Q2.5-Veltha-14B-0.5-GGUF is a quantized model developed by Bartowski for text generation tasks. It is based on the djuna/Q2.5-Veltha-14B-0.5 model and employs various quantization techniques to optimize performance across different hardware configurations.
Architecture
The model is built on the architecture of the djuna/Q2.5-Veltha-14B-0.5 and utilizes llama.cpp for quantization. Different quantization levels (e.g., Q8_0, Q6_K_L) are available, each tailored for specific performance and resource requirements. The quantization process uses the imatrix option to balance quality and size.
Training
The model has been evaluated using multiple datasets, achieving varying levels of accuracy across tasks:
- IFEval (0-shot): 77.96% strict accuracy
- BBH (3-shot): 50.32% normalized accuracy
- MATH Lvl 5 (4-shot): 33.84% exact match
- GPQA (0-shot): 15.77% normalized accuracy
- MuSR (0-shot): 14.17% normalized accuracy
- MMLU-PRO (5-shot): 47.72% accuracy
These evaluations are sourced from the Open LLM Leaderboard.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Ensure you have
huggingface_hub
installed.pip install -U "huggingface_hub[cli]"
-
Download the Model: Use the
huggingface-cli
to download the desired quantized model file.huggingface-cli download bartowski/Q2.5-Veltha-14B-0.5-GGUF --include "Q2.5-Veltha-14B-0.5-Q4_K_M.gguf" --local-dir ./
-
Select a Quantization Level: Choose a model file appropriate for your hardware, considering RAM and VRAM availability.
-
Consider Cloud GPUs: If local resources are insufficient, consider using cloud GPU services such as AWS, Google Cloud, or Azure to run the model.
License
The model's license information is not explicitly provided in the documentation. Users are advised to check the Hugging Face model card for any licensing details before use.