Qwen2.5 7 B Instruct G G U F

Qwen

Introduction

Qwen2.5 is the latest series of the Qwen large language models, featuring both base and instruction-tuned models with parameters ranging from 0.5 to 72 billion. The Qwen2.5 series offers significant improvements in knowledge, coding, mathematics, instruction-following, long-text generation, understanding structured data, and multilingual support for over 29 languages.

Architecture

The Qwen2.5-7B-Instruct-GGUF model is a causal language model with the following architecture specifics:

  • Training Stage: Pretraining & Post-training
  • Transformers Features: RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Parameters: 7.61 billion (6.53 billion non-embedding)
  • Layers: 28
  • Attention Heads (GQA): 28 for Q and 4 for KV
  • Context Length: Full 32,768 tokens; generation 8,192 tokens
  • Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0

Training

This model is instruction-tuned and designed to improve upon its predecessors by enhancing its ability to follow instructions, generate long texts, and handle structured data. Additionally, it supports a long context of up to 128K tokens and can generate up to 8K tokens.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Hugging Face CLI:

    pip install -U huggingface_hub
    
  2. Download the Model:

    • Use the following command to download the necessary GGUF files:
      huggingface-cli download Qwen/Qwen2.5-7B-Instruct-GGUF --include "qwen2.5-7b-instruct-q5_k_m*.gguf" --local-dir . --local-dir-use-symlinks False
      
  3. Merge Split Files:

    • If files are split, merge them using:
      ./llama-gguf-split --merge qwen2.5-7b-instruct-q5_k_m-00001-of-00002.gguf qwen2.5-7b-instruct-q5_k_m.gguf
      
  4. Run the Model:

    • To start a chatbot-like experience, execute:
      ./llama-cli -m <gguf-file-path> -co -cnv -p "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." -fa -ngl 80 -n 512
      

Cloud GPUs are recommended for handling large models and ensuring efficient operations.

License

The Qwen2.5-7B-Instruct-GGUF model is released under the Apache 2.0 License. More details can be found at the license link.

More Related APIs in Text Generation