Qwen2.5 14 B Instruct G G U F
QwenIntroduction
Qwen2.5 is a series of advanced large language models, offering significant strides in capabilities over its predecessor, Qwen2. The models in this series range from 0.5 to 72 billion parameters and are designed to deliver substantial improvements in knowledge, coding, mathematics, instruction following, long text generation, and multilingual support. The Qwen2.5-14B model, specifically in GGUF format, is an instruction-tuned variant featuring advanced architecture and training techniques.
Architecture
The Qwen2.5-14B-Instruct-GGUF model is a causal language model with the following architectural features:
- Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
- Parameters: Total 14.7 billion; Non-Embedding 13.1 billion.
- Layers: 48.
- Attention Heads (GQA): 40 for Q and 8 for KV.
- Context Length: Full 32,768 tokens with a generation capacity of 8192 tokens.
- Quantization Options: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0.
Training
The model underwent pretraining and post-training processes to enhance its capabilities across various domains. It supports long-context processing up to 128K tokens and can generate up to 8K tokens. Additionally, it is equipped with multilingual capabilities covering over 29 languages.
Guide: Running Locally
To run the Qwen2.5-14B-Instruct-GGUF model locally, follow these steps:
-
Install Hugging Face CLI:
pip install -U huggingface_hub
-
Download Model: Use the huggingface-cli to download the necessary GGUF files:
huggingface-cli download Qwen/Qwen2.5-14B-Instruct-GGUF --include "qwen2.5-14b-instruct-q5_k_m*.gguf" --local-dir . --local-dir-use-symlinks False
-
Merge Files (if needed): For split files, merge them using:
./llama-gguf-split --merge qwen2.5-14b-instruct-q5_k_m-00001-of-00003.gguf qwen2.5-14b-instruct-q5_k_m.gguf
-
Run the Model: Launch the model in conversation mode for a chatbot-like experience:
./llama-cli -m <gguf-file-path> -co -cnv -p "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." -fa -ngl 80 -n 512
Cloud GPUs such as AWS, Google Cloud, or Azure are recommended for efficient processing and performance.
License
The Qwen2.5-14B-Instruct-GGUF model is released under the Apache 2.0 license. For more details, refer to the license file.