Qwen2.5 Coder 14 B Instruct G G U F
QwenIntroduction
Qwen2.5-Coder is part of the Code-Specific Qwen large language models, designed for code generation, reasoning, and fixing. It improves upon previous versions with 5.5 trillion training tokens and supports real-world applications like Code Agents. Notable features include long-context support up to 128K tokens and compatibility with various model sizes.
Architecture
The Qwen2.5-Coder 14B model is a Causal Language Model built using transformers, RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It has 14.7 billion parameters, with 13.1 billion non-embedding parameters. The model consists of 48 layers and 48 attention heads (40 for Q and 8 for KV). It supports a full context length of 32,768 tokens and offers multiple quantization options, including q2_K and q8_0.
Training
Qwen2.5-Coder was trained using a massive dataset that includes source code, text-code grounding, and synthetic data, among others. The training process involves both pretraining and post-training stages to enhance its code-related capabilities and general competencies.
Guide: Running Locally
- Installation
Install the required package:pip install -U huggingface_hub
- Download Files
Use the following command to download the necessary GGUF files:huggingface-cli download Qwen/Qwen2.5-Coder-14B-Instruct-GGUF --include "qwen2.5-coder-14b-instruct-q5_k_m*.gguf" --local-dir . --local-dir-use-symlinks False
- Merge Files
(If needed) Merge split files using:./llama-gguf-split --merge qwen2.5-coder-14b-instruct-q5_k_m-00001-of-00002.gguf qwen2.5-coder-14b-instruct-q5_k_m.gguf
- Run Model
Start the model in conversation mode:./llama-cli -m <gguf-file-path> \ -co -cnv -p "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." \ -fa -ngl 80 -n 512
For better performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The Qwen2.5-Coder model is released under the Apache 2.0 license. More details can be found here.