Qwen2.5 0.5 B
QwenIntroduction
Qwen2.5-0.5B is part of the Qwen series of large language models, designed with improvements in areas like coding, mathematics, instruction following, and multilingual capabilities. It supports long-contexts and can generate up to 8K tokens. Multilingual support is available for over 29 languages. This version offers a base model with 0.5 billion parameters.
Architecture
- Type: Causal Language Models
- Training Stage: Pretraining
- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
- Number of Parameters: 0.49B
- Number of Parameters (Non-Embedding): 0.36B
- Number of Layers: 24
- Number of Attention Heads (GQA): 14 for Q and 2 for KV
- Context Length: Full 32,768 tokens
Training
The model has undergone pretraining and is not recommended for conversational use without further post-training, such as SFT or RLHF. It is developed using the latest version of Hugging Face's Transformers library.
Guide: Running Locally
- Requirements: Ensure you have the latest version of the Hugging Face Transformers library, as versions below 4.37.0 will result in a
KeyError: 'qwen2'
. - Environment Setup: Install necessary dependencies using Python and Pip.
- Model Download: Use the Hugging Face Model Hub to download the Qwen2.5-0.5B model.
- Execution: Load the model and tokenizer using the Transformers library and run inference.
- Hardware Recommendation: Utilize cloud GPUs like NVIDIA V100 or A100 for efficient performance.
License
The Qwen2.5-0.5B model is licensed under the Apache 2.0 License. For more details, refer to the license file.