Q2.5 Veltha 14 B 0.5
djunaIntroduction
Q2.5-Veltha-14B-0.5 is a text generation model developed by merging several pre-trained language models using the mergekit
tool. It is designed to enhance performance in various text generation tasks.
Architecture
The architecture of Q2.5-Veltha-14B-0.5 involves merging multiple models using the della_linear
method. The base model for the merge is arcee-ai/SuperNova-Medius
. The merged models include:
huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
allura-org/TQ2.5-14B-Aletheia-v1
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
v000000/Qwen2.5-Lumen-14B
The configuration uses float32
for dtype
and bfloat16
for out_dtype
, with additional parameters for merging.
Training
The model is evaluated on various datasets with different few-shot settings. Its performance metrics include:
- IFEval (0-Shot): 77.96 strict accuracy
- BBH (3-Shot): 50.32 normalized accuracy
- MATH Lvl 5 (4-Shot): 33.84 exact match
- GPQA (0-shot): 15.77 normalized accuracy
- MuSR (0-shot): 14.17 normalized accuracy
- MMLU-PRO (5-shot): 47.72 accuracy
The evaluation results are detailed on the Open LLM Leaderboard.
Guide: Running Locally
To run Q2.5-Veltha-14B-0.5 locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the Hugging Face Transformers library installed.
pip install transformers
-
Download the Model: Use the Hugging Face model hub to download the model.
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("djuna/Q2.5-Veltha-14B-0.5") model = AutoModelForCausalLM.from_pretrained("djuna/Q2.5-Veltha-14B-0.5")
-
Run Inference:
inputs = tokenizer("Your input text here", return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-
Utilize Cloud GPUs: Consider using cloud GPU services like AWS, GCP, or Azure to handle the computational demands efficiently.
License
The model is available under the terms specified by the individual model licenses of the merged components. Ensure compliance with these terms when using the model.