Qwen2.5 Coder 7 B Instruct
QwenIntroduction
Qwen2.5-Coder is part of the Qwen series, formerly known as CodeQwen, designed for code-specific tasks in large language models. The series includes models with sizes ranging from 0.5 to 32 billion parameters, catering to various developer needs. The Qwen2.5-Coder models improve upon previous versions by enhancing code generation, reasoning, and fixing, using a dataset of 5.5 trillion tokens. The 32B version is state-of-the-art, paralleling the capabilities of GPT-4o. It supports real-world applications and maintains strong performance in mathematics and general tasks, with a context length of up to 128K tokens.
Architecture
Qwen2.5-Coder-7B Instruct is a causal language model with 7.61 billion parameters, of which 6.53 billion are non-embedding parameters. It features 28 layers and uses a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It supports a full context length of 131,072 tokens and is optimized for handling long texts.
Training
The Qwen2.5-Coder models undergo pretraining and post-training. The extensive dataset includes source code, text-code grounding, and synthetic data to enhance the model's capabilities in code-related tasks.
Guide: Running Locally
To run Qwen2.5-Coder-7B Instruct locally, follow these steps:
-
Install Requirements: Ensure you have the latest version of Hugging Face Transformers.
pip install transformers
-
Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Qwen/Qwen2.5-Coder-7B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name)
-
Generate Text:
prompt = "write a quick sort algorithm." messages = [{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=512) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-
For Long Texts: Modify
config.json
to enable YaRN for handling texts longer than 32,768 tokens."rope_scaling": { "factor": 4.0, "original_max_position_embeddings": 32768, "type": "yarn" }
-
Cloud GPUs: For optimal performance, especially for large models, consider using cloud GPUs such as AWS EC2, Google Cloud TPU, or Azure ML.
License
The Qwen2.5-Coder-7B-Instruct model is licensed under the Apache 2.0 License. For more details, see the license file.