Qwen2.5 Coder 3 B Instruct G G U F
QwenIntroduction
Qwen2.5-Coder is part of the Qwen series of large language models, specifically designed for code generation and related tasks. It supports various model sizes from 0.5 to 32 billion parameters, catering to diverse developer needs. The model enhances performance in code generation, reasoning, and fixing, with training on 5.5 trillion tokens including source code and synthetic data. The Qwen2.5-Coder-32B model is considered state-of-the-art in open-source code language models.
Architecture
The Qwen2.5-Coder-3B-Instruct-GGUF model is built using transformer architecture with the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Components: RoPE, SwiGLU, RMSNorm, Attention QKV bias, tied word embeddings
- Parameters: 3.09 billion total, 2.77 billion non-embedding
- Layers: 36
- Attention Heads (GQA): 16 for Q, 2 for KV
- Context Length: 32,768 tokens (support for longer sequences up to 131,072 tokens exists in non-GGUF models)
- Quantization: Various levels including q2_K, q3_K_M, and q8_0
Training
Qwen2.5-Coder is trained on a diverse dataset comprising 5.5 trillion tokens, which includes source code, text-code grounding, and synthetic data. This extensive training underpins its benchmark improvements in code-related tasks and general competencies.
Guide: Running Locally
To run Qwen2.5-Coder locally, follow these steps:
-
Clone the Repository:
Clone thellama.cpp
repository as per the official instructions.git clone https://github.com/ggerganov/llama.cpp cd llama.cpp
-
Install Hugging Face CLI:
pip install -U huggingface_hub
-
Download the Model:
Use the Hugging Face CLI to download the GGUF file.huggingface-cli download Qwen/Qwen2.5-Coder-3B-Instruct-GGUF qwen2.5-3b-coder-instruct-q5_k_m.gguf --local-dir . --local-dir-use-symlinks False
-
Run the Model:
Execute the model in conversation mode for a chatbot-like experience../llama-cli -m <gguf-file-path> -co -cnv -p "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." -fa -ngl 80 -n 512
Suggested Cloud GPUs
For optimal performance, consider using cloud GPUs such as NVIDIA A100 or V100, especially when handling larger models or longer context lengths.
License
The Qwen2.5-Coder model is released under the qwen-research
license. For detailed licensing information, please refer to the license document.