Qwen2.5 Coder 3 B Instruct G G U F LLM Model

Introduction

Qwen2.5-Coder is part of the Qwen series of large language models, specifically designed for code generation and related tasks. It supports various model sizes from 0.5 to 32 billion parameters, catering to diverse developer needs. The model enhances performance in code generation, reasoning, and fixing, with training on 5.5 trillion tokens including source code and synthetic data. The Qwen2.5-Coder-32B model is considered state-of-the-art in open-source code language models.

Architecture

The Qwen2.5-Coder-3B-Instruct-GGUF model is built using transformer architecture with the following features:

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Components: RoPE, SwiGLU, RMSNorm, Attention QKV bias, tied word embeddings
Parameters: 3.09 billion total, 2.77 billion non-embedding
Layers: 36
Attention Heads (GQA): 16 for Q, 2 for KV
Context Length: 32,768 tokens (support for longer sequences up to 131,072 tokens exists in non-GGUF models)
Quantization: Various levels including q2_K, q3_K_M, and q8_0

Training

Qwen2.5-Coder is trained on a diverse dataset comprising 5.5 trillion tokens, which includes source code, text-code grounding, and synthetic data. This extensive training underpins its benchmark improvements in code-related tasks and general competencies.

Guide: Running Locally

To run Qwen2.5-Coder locally, follow these steps:

Clone the Repository:
Clone the llama.cpp repository as per the official instructions.
```
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
```
Install Hugging Face CLI:
```
pip install -U huggingface_hub
```

Download the Model:
Use the Hugging Face CLI to download the GGUF file.

huggingface-cli download Qwen/Qwen2.5-Coder-3B-Instruct-GGUF qwen2.5-3b-coder-instruct-q5_k_m.gguf --local-dir . --local-dir-use-symlinks False

Run the Model:
Execute the model in conversation mode for a chatbot-like experience.

./llama-cli -m <gguf-file-path> -co -cnv -p "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." -fa -ngl 80 -n 512

Suggested Cloud GPUs

For optimal performance, consider using cloud GPUs such as NVIDIA A100 or V100, especially when handling larger models or longer context lengths.

License

The Qwen2.5-Coder model is released under the qwen-research license. For detailed licensing information, please refer to the license document.

More Related APIs in Text Generation