Q2.5 Veltha 14 B 0.5 G G U F LLM Model

Introduction

The Q2.5-Veltha-14B-0.5-GGUF is a quantized model developed by Bartowski for text generation tasks. It is based on the djuna/Q2.5-Veltha-14B-0.5 model and employs various quantization techniques to optimize performance across different hardware configurations.

Architecture

The model is built on the architecture of the djuna/Q2.5-Veltha-14B-0.5 and utilizes llama.cpp for quantization. Different quantization levels (e.g., Q8_0, Q6_K_L) are available, each tailored for specific performance and resource requirements. The quantization process uses the imatrix option to balance quality and size.

Training

The model has been evaluated using multiple datasets, achieving varying levels of accuracy across tasks:

IFEval (0-shot): 77.96% strict accuracy
BBH (3-shot): 50.32% normalized accuracy
MATH Lvl 5 (4-shot): 33.84% exact match
GPQA (0-shot): 15.77% normalized accuracy
MuSR (0-shot): 14.17% normalized accuracy
MMLU-PRO (5-shot): 47.72% accuracy

These evaluations are sourced from the Open LLM Leaderboard.

Guide: Running Locally

To run the model locally, follow these steps:

Install Dependencies: Ensure you have huggingface_hub installed.
```
pip install -U "huggingface_hub[cli]"
```

Download the Model: Use the huggingface-cli to download the desired quantized model file.

huggingface-cli download bartowski/Q2.5-Veltha-14B-0.5-GGUF --include "Q2.5-Veltha-14B-0.5-Q4_K_M.gguf" --local-dir ./

Select a Quantization Level: Choose a model file appropriate for your hardware, considering RAM and VRAM availability.
Consider Cloud GPUs: If local resources are insufficient, consider using cloud GPU services such as AWS, Google Cloud, or Azure to run the model.

License

The model's license information is not explicitly provided in the documentation. Users are advised to check the Hugging Face model card for any licensing details before use.

More Related APIs in Text Generation