Qwen2.5 7 B Instruct G G U F
QwenIntroduction
Qwen2.5 is the latest series of the Qwen large language models, featuring both base and instruction-tuned models with parameters ranging from 0.5 to 72 billion. The Qwen2.5 series offers significant improvements in knowledge, coding, mathematics, instruction-following, long-text generation, understanding structured data, and multilingual support for over 29 languages.
Architecture
The Qwen2.5-7B-Instruct-GGUF model is a causal language model with the following architecture specifics:
- Training Stage: Pretraining & Post-training
- Transformers Features: RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- Parameters: 7.61 billion (6.53 billion non-embedding)
- Layers: 28
- Attention Heads (GQA): 28 for Q and 4 for KV
- Context Length: Full 32,768 tokens; generation 8,192 tokens
- Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
Training
This model is instruction-tuned and designed to improve upon its predecessors by enhancing its ability to follow instructions, generate long texts, and handle structured data. Additionally, it supports a long context of up to 128K tokens and can generate up to 8K tokens.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Hugging Face CLI:
pip install -U huggingface_hub
-
Download the Model:
- Use the following command to download the necessary GGUF files:
huggingface-cli download Qwen/Qwen2.5-7B-Instruct-GGUF --include "qwen2.5-7b-instruct-q5_k_m*.gguf" --local-dir . --local-dir-use-symlinks False
- Use the following command to download the necessary GGUF files:
-
Merge Split Files:
- If files are split, merge them using:
./llama-gguf-split --merge qwen2.5-7b-instruct-q5_k_m-00001-of-00002.gguf qwen2.5-7b-instruct-q5_k_m.gguf
- If files are split, merge them using:
-
Run the Model:
- To start a chatbot-like experience, execute:
./llama-cli -m <gguf-file-path> -co -cnv -p "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." -fa -ngl 80 -n 512
- To start a chatbot-like experience, execute:
Cloud GPUs are recommended for handling large models and ensuring efficient operations.
License
The Qwen2.5-7B-Instruct-GGUF model is released under the Apache 2.0 License. More details can be found at the license link.