Qwen2.5 Coder 32 B

Qwen

Introduction

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models, formerly known as CodeQwen. It is designed to enhance code generation, reasoning, and fixing capabilities. It includes six model sizes from 0.5 to 32 billion parameters. The 32B variant brings significant improvements, including a comprehensive foundation for real-world applications, long-context support up to 128K tokens, and training on 5.5 trillion tokens comprising diverse data types.

Architecture

The Qwen2.5-Coder-32B model is a Causal Language Model built using the transformers architecture. It incorporates advanced techniques such as RoPE, SwiGLU, RMSNorm, and Attention QKV bias. The model has 32.5 billion parameters, 64 layers, and uses 40 attention heads for Q and 8 for KV with a full context length of 131,072 tokens.

Training

Qwen2.5-Coder-32B was pretrained on a diverse set of data sources, including source code, text-code grounding, and synthetic data, totaling 5.5 trillion tokens. This extensive training aims to enhance the model's coding, mathematical, and general capabilities, making it suitable for various applications.

Guide: Running Locally

To run Qwen2.5-Coder-32B locally, ensure you have the latest version of the Hugging Face transformers library. Models like this often require substantial computational resources, so consider using cloud GPUs for optimal performance.

Basic Steps:

  1. Install Dependencies:

    pip install transformers
    
  2. Download the Model: Use the Hugging Face Model Hub to download Qwen2.5-Coder-32B.

  3. Load and Run:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B")
    model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B")
    
    input_text = "Your code here"
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids
    output = model.generate(input_ids)
    print(tokenizer.decode(output[0], skip_special_tokens=True))
    
  4. Cloud GPU Recommendation: Consider using cloud services such as AWS, GCP, or Azure to access powerful GPUs like NVIDIA A100s for efficient model inference.

License

Qwen2.5-Coder-32B is distributed under the Apache 2.0 License. For more details, refer to the license file.

More Related APIs in Text Generation