Qwen2.5 7 B
QwenIntroduction
Qwen2.5 is a series of large language models featuring improved capabilities in knowledge representation, coding, mathematics, and multilingual support. The models enhance instruction following and can handle long texts and structured data. They are designed to generate structured outputs like JSON and are resilient to diverse system prompts, making them suitable for role-play and chatbot applications. This series supports over 29 languages and can manage contexts up to 128K tokens with a maximum generation of 8K tokens.
Architecture
The base 7B Qwen2.5 model is a causal language model with a transformer architecture that incorporates RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It includes 7.61 billion parameters, with 6.53 billion non-embedding parameters, distributed across 28 layers and 28 attention heads for Q and 4 for KV. The model supports a context length of 131,072 tokens.
Training
Qwen2.5 models undergo pretraining, with recommendations for post-training techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to enhance conversation capabilities.
Guide: Running Locally
To run the Qwen2.5 model locally, follow these steps:
-
Install Dependencies: Ensure you have the latest version of the Hugging Face
transformers
library. Versions below 4.37.0 may cause errors. -
Download Model: Fetch the Qwen2.5-7B model from the Hugging Face repository.
-
Set Up Environment: Use a Python environment with the necessary libraries for model execution.
-
Run Inference: Load the model into your script and perform text generation tasks.
Cloud GPUs Recommendation: For optimal performance, especially with the model's large context length, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
The Qwen2.5 model is released under the Apache 2.0 License. More details can be found in the license file.