Qwen2.5 Coder 7 B Instruct

Qwen

Introduction

Qwen2.5-Coder is part of the Qwen series, formerly known as CodeQwen, designed for code-specific tasks in large language models. The series includes models with sizes ranging from 0.5 to 32 billion parameters, catering to various developer needs. The Qwen2.5-Coder models improve upon previous versions by enhancing code generation, reasoning, and fixing, using a dataset of 5.5 trillion tokens. The 32B version is state-of-the-art, paralleling the capabilities of GPT-4o. It supports real-world applications and maintains strong performance in mathematics and general tasks, with a context length of up to 128K tokens.

Architecture

Qwen2.5-Coder-7B Instruct is a causal language model with 7.61 billion parameters, of which 6.53 billion are non-embedding parameters. It features 28 layers and uses a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It supports a full context length of 131,072 tokens and is optimized for handling long texts.

Training

The Qwen2.5-Coder models undergo pretraining and post-training. The extensive dataset includes source code, text-code grounding, and synthetic data to enhance the model's capabilities in code-related tasks.

Guide: Running Locally

To run Qwen2.5-Coder-7B Instruct locally, follow these steps:

  1. Install Requirements: Ensure you have the latest version of Hugging Face Transformers.

    pip install transformers
    
  2. Load Model and Tokenizer:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Generate Text:

    prompt = "write a quick sort algorithm."
    messages = [{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
                {"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    
    generated_ids = model.generate(**model_inputs, max_new_tokens=512)
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
  4. For Long Texts: Modify config.json to enable YaRN for handling texts longer than 32,768 tokens.

    "rope_scaling": {
      "factor": 4.0,
      "original_max_position_embeddings": 32768,
      "type": "yarn"
    }
    
  5. Cloud GPUs: For optimal performance, especially for large models, consider using cloud GPUs such as AWS EC2, Google Cloud TPU, or Azure ML.

License

The Qwen2.5-Coder-7B-Instruct model is licensed under the Apache 2.0 License. For more details, see the license file.

More Related APIs in Text Generation