Qwen2.5 14 B Instruct LLM Model

Introduction

Qwen2.5 is a series of advanced large language models, offering a range from 0.5 to 72 billion parameters. Key enhancements include expanded domain expertise in coding and mathematics, improved instruction following, and support for generating structured outputs like JSON. It supports up to 128K tokens with multilingual capabilities across 29 languages. The 14B instruction-tuned variant features a transformer architecture with 14.7 billion parameters, 48 layers, and a context length of 131,072 tokens.

Architecture

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Components: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias
Parameters:
- Total: 14.7 billion
- Non-Embedding: 13.1 billion
Layers: 48
Attention Heads: 40 for Q, 8 for KV
Context Length: Full 131,072 tokens, generation up to 8,192 tokens

Training

The Qwen2.5 models have undergone extensive pretraining and instruction tuning. They are optimized for long text generation, instruction adherence, and structured data understanding, with robust role-play and condition-setting capabilities.

Guide: Running Locally

Install Requirements:
- Ensure the latest version of Hugging Face Transformers is installed to avoid compatibility issues.

Load Model and Tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

Generate Text:
- Use the provided code snippet to interact with the model and generate responses.
GPU Recommendations:
- For optimal performance, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure with sufficient memory capabilities.
Processing Long Texts:
- Update config.json to enable YaRN for handling inputs exceeding 32,768 tokens.

License

This model is licensed under Apache 2.0. For more details, refer to the license link.

More Related APIs in Text Generation