Qwen2.5 1.5 B Instruct
QwenIntroduction
Qwen2.5 is the latest series of Qwen large language models, offering a range of base and instruction-tuned models from 0.5 to 72 billion parameters. Improvements over Qwen2 include enhanced knowledge, better coding and mathematical abilities, improved instruction following, support for generating long texts and understanding structured data, and multilingual support for over 29 languages. The instruction-tuned 1.5B Qwen2.5 model features a causal language model architecture with 1.54 billion parameters and supports long-context processing up to 128K tokens.
Architecture
The model architecture includes:
- Transformers with RoPE, SwiGLU, RMSNorm, attention QKV bias, and tied word embeddings.
- 28 layers and 12 attention heads for Q and 2 for KV.
- Context length of 32,768 tokens with generation up to 8,192 tokens.
Training
Qwen2.5 models undergo pretraining and post-training stages. The model is designed to handle diverse system prompts, enhancing role-play and condition-setting for chatbots. It supports multilingual capabilities for a wide array of languages.
Guide: Running Locally
-
Requirements: Ensure you have the latest version of Hugging Face's Transformers library, as older versions may not support the Qwen2 model.
-
Quickstart Code:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Qwen/Qwen2.5-1.5B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=512) generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-
Cloud GPUs: For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
Qwen2.5-1.5B-Instruct is licensed under the Apache 2.0 License. You can find the license details here.