Qwen1.5 0.5 B
QwenIntroduction
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model that has been pretrained on a large dataset. The main advancements from the previous Qwen model include:
- An expansion to 8 model sizes, ranging from 0.5B to 72B dense models, including a 14B MoE model with 2.7B activated.
- Enhanced performance in chat models.
- Multilingual support for both base and chat models.
- Consistent support for a 32K context length across all model sizes.
- No requirement for
trust_remote_code
.
For comprehensive details, visit the project blog and GitHub repository.
Architecture
Qwen1.5 features a series of language models, including decoder models of various sizes. Each size includes a base language model and an aligned chat model. It utilizes the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, and a mixture of sliding window attention and full attention. The tokenizer is improved to support multiple natural languages and codes. The beta version does not include GQA and the mixture of SWA and full attention.
Training
The model is integrated with the latest version of Hugging Face's Transformers library. Ensure that transformers>=4.37.0
is installed to avoid the following error: KeyError: 'qwen2'
. It is recommended to perform post-training methods such as SFT, RLHF, or continued pretraining instead of using the base language models directly for text generation.
Guide: Running Locally
- Installation: Ensure you have Python and pip installed, then install the Transformers library with the command:
pip install transformers>=4.37.0
- Clone the Repository: Download the model code from the GitHub repository.
- Load the Model: Use the Transformers library to load and interact with the model.
- GPU Requirements: For optimal performance, particularly with larger models, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The Qwen1.5 model is released under the tongyi-qianwen-research
license. For more information, view the license file.