Qwen2.5 Coder 32 B Instruct G G U F
QwenIntroduction
Qwen2.5-Coder is a series of code-specific Qwen large language models, formerly known as CodeQwen. It supports six model sizes ranging from 0.5 to 32 billion parameters, catering to various developer needs. The Qwen2.5-Coder-32B offers significant advancements in code generation, reasoning, and fixing. It uses 5.5 trillion training tokens and achieves coding capabilities on par with GPT-4o. The model supports long contexts up to 128K tokens and is suitable for real-world applications like Code Agents, enhancing both coding and mathematical competencies.
Architecture
The Qwen2.5-Coder-32B model is built using transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It contains 32.5 billion parameters, 31 billion of which are non-embedding. The architecture comprises 64 layers and 40 attention heads for Q and 8 for KV. It supports a context length of up to 32,768 tokens. Quantization options include q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, and q8_0.
Training
The model undergoes both pretraining and post-training stages. It is instruction-tuned and optimized for causal language modeling tasks. The training process involves extensive datasets, including source code, text-code grounding, and synthetic data.
Guide: Running Locally
For running Qwen2.5-Coder-32B locally, follow these steps:
-
Install Dependencies:
pip install -U huggingface_hub
-
Download the Model: Use the
huggingface-cli
tool to download the necessary GGUF files:huggingface-cli download Qwen/Qwen2.5-Coder-32B-Instruct-GGUF --include "qwen2.5-coder-32b-instruct-q5_k_m*.gguf" --local-dir . --local-dir-use-symlinks False
-
Merge Split Files (if necessary): If the downloaded files are split, merge them using:
./llama-gguf-split --merge qwen2.5-coder-32b-instruct-q5_k_m-00001-of-00003.gguf qwen2.5-coder-32b-instruct-q5_k_m.gguf
-
Run the Model: Initiate a conversation mode with the following command:
./llama-cli -m <gguf-file-path> -co -cnv -p "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." -fa -ngl 80 -n 512
Cloud GPUs can be considered for running this model due to high computational requirements, leveraging services like AWS, Google Cloud, or Azure.
License
Qwen2.5-Coder-32B is licensed under the Apache-2.0 License. For more details, refer to the LICENSE file.