Qwen2.5 Coder 32 B Instruct
QwenIntroduction
Qwen2.5-Coder is part of the latest series of Code-Specific Qwen large language models, designed to enhance code generation, reasoning, and fixing capabilities. It supports mainstream model sizes ranging from 0.5 to 32 billion parameters. This series introduces improvements over its predecessor, CodeQwen1.5, by enhancing code-related tasks and supporting long-context processing up to 128K tokens. The 32B model is instruction-tuned and provides a comprehensive foundation for real-world applications.
Architecture
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- Number of Parameters: 32.5B (31.0B excluding embeddings)
- Number of Layers: 64
- Attention Heads: 40 for Q and 8 for KV
- Context Length: Up to 131,072 tokens
Training
The Qwen2.5-Coder model is trained with 5.5 trillion tokens, including various data types such as source code and synthetic data. It leverages state-of-the-art techniques to match the coding capabilities of models like GPT-4o.
Guide: Running Locally
To run Qwen2.5-Coder locally, ensure you have the latest version of the Hugging Face transformers library. Here are the basic steps:
- Install Transformers: Make sure your transformers library version is 4.37.0 or above to avoid compatibility issues.
- Load the Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Qwen/Qwen2.5-Coder-32B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name)
- Generate Text:
prompt = "write a quick sort algorithm." messages = [{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=512) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
For optimal performance, consider using cloud GPUs to handle the model's computational requirements.
License
Qwen2.5-Coder is released under the Apache-2.0 license. You can view the license here.