Qwen2.5 32 B Instruct

Qwen

Introduction

Qwen2.5 is the latest series in the Qwen large language models family, featuring both base and instruction-tuned models with parameter sizes ranging from 0.5 to 72 billion. It offers significant improvements over previous versions, including enhanced knowledge and capabilities in coding and mathematics, better instruction following, and the ability to generate structured outputs like JSON. It supports multilingual functionalities in over 29 languages and accommodates long-context processing up to 128K tokens, with generation capabilities up to 8K tokens.

Architecture

The Qwen2.5-32B-Instruct model is designed as a causal language model with the following architectural details:

  • Training Stage: Pretraining & Post-training
  • Transformers Architecture: Includes features like RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Parameters: 32.5 billion total, 31.0 billion non-embedding
  • Layers: 64
  • Attention Heads: 40 for Q and 8 for KV
  • Context Length: Supports up to 131,072 tokens for context and 8,192 tokens for generation

Training

The model is trained with specialized expert models to improve its capabilities, particularly in coding and mathematics. It also benefits from extensive instruction tuning to enhance its ability to follow diverse prompts and generate coherent text based on structured data.

Guide: Running Locally

Basic Steps

  1. Install Requirements: Ensure you have the latest version of the transformers library installed.

    pip install transformers
    

    Note: Versions older than 4.37.0 may result in errors.

  2. Load Model and Tokenizer:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "Qwen/Qwen2.5-32B-Instruct"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Generate Text: Use the model to generate text based on a given prompt.

    prompt = "Give me a short introduction to large language model."
    # Create messages and generate response...
    
  4. Processing Long Texts: Modify config.json to use YaRN for handling inputs exceeding 32,768 tokens.

Cloud GPUs

For optimal performance, especially with large models, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

Qwen2.5-32B-Instruct is released under the Apache 2.0 license. You can view the full license here.

More Related APIs in Text Generation