Qwen2.5 1.5 B
QwenIntroduction
Qwen2.5 is a series of advanced large language models featuring a variety of base and instruction-tuned models with parameter counts from 0.5 to 72 billion. Enhancements in this version include improved knowledge bases, coding, and mathematical capabilities. The model excels in instruction following, long text generation, understanding and generating structured data, and supporting 29+ languages. This repository hosts the base 1.5B Qwen2.5 model.
Architecture
- Type: Causal Language Models
- Training Stage: Pretraining
- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
- Parameters:
- Total: 1.54B
- Non-Embedding: 1.31B
- Layers: 28
- Attention Heads (GQA): 12 for Q and 2 for KV
- Context Length: Up to 32,768 tokens
The model is not recommended for conversational purposes without further training such as SFT or RLHF.
Training
The model requires the latest version of Hugging Face Transformers. Versions below 4.37.0 will result in errors, specifically a KeyError: 'qwen2'
.
Guide: Running Locally
- Setup Environment: Ensure you have the latest version of the Hugging Face Transformers library.
- Data Preparation: Prepare your data according to the model requirements.
- Model Download: Access the model through the Hugging Face Model Hub.
- Run Inference: Use the model for tasks like text generation.
- Optimization: Consider using cloud GPU services such as AWS, Google Cloud, or Azure for enhanced performance.
License
This model is licensed under Apache 2.0. You can find the full license details here.