Q2.5 Veltha 14 B 0.5

djuna

Introduction

Q2.5-Veltha-14B-0.5 is a text generation model developed by merging several pre-trained language models using the mergekit tool. It is designed to enhance performance in various text generation tasks.

Architecture

The architecture of Q2.5-Veltha-14B-0.5 involves merging multiple models using the della_linear method. The base model for the merge is arcee-ai/SuperNova-Medius. The merged models include:

  • huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
  • allura-org/TQ2.5-14B-Aletheia-v1
  • EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
  • v000000/Qwen2.5-Lumen-14B

The configuration uses float32 for dtype and bfloat16 for out_dtype, with additional parameters for merging.

Training

The model is evaluated on various datasets with different few-shot settings. Its performance metrics include:

  • IFEval (0-Shot): 77.96 strict accuracy
  • BBH (3-Shot): 50.32 normalized accuracy
  • MATH Lvl 5 (4-Shot): 33.84 exact match
  • GPQA (0-shot): 15.77 normalized accuracy
  • MuSR (0-shot): 14.17 normalized accuracy
  • MMLU-PRO (5-shot): 47.72 accuracy

The evaluation results are detailed on the Open LLM Leaderboard.

Guide: Running Locally

To run Q2.5-Veltha-14B-0.5 locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and the Hugging Face Transformers library installed.

    pip install transformers
    
  2. Download the Model: Use the Hugging Face model hub to download the model.

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("djuna/Q2.5-Veltha-14B-0.5")
    model = AutoModelForCausalLM.from_pretrained("djuna/Q2.5-Veltha-14B-0.5")
    
  3. Run Inference:

    inputs = tokenizer("Your input text here", return_tensors="pt")
    outputs = model.generate(**inputs)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    
  4. Utilize Cloud GPUs: Consider using cloud GPU services like AWS, GCP, or Azure to handle the computational demands efficiently.

License

The model is available under the terms specified by the individual model licenses of the merged components. Ensure compliance with these terms when using the model.

More Related APIs in Text Generation