Qw Q 32 B Preview

Qwen

Introduction

QwQ-32B-Preview is an experimental research model created by the Qwen Team, aimed at enhancing AI reasoning capabilities. This preview release showcases advanced analytical abilities but presents several limitations, such as language mixing, recursive reasoning loops, and safety concerns. It is particularly strong in math and coding but needs improvement in common sense reasoning and nuanced language understanding.

Architecture

  • Type: Causal Language Models
  • Training Stage: Pretraining & Post-training
  • Architecture Components: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Number of Layers: 64
  • Attention Heads (GQA): 40 for Q and 8 for KV
  • Context Length: Full 32,768 tokens

Training

The model is trained in stages, including pretraining and post-training, utilizing the latest advancements in transformer architectures. It requires the latest version of Hugging Face transformers for optimal functionality.

Model Stats Number

  • Total Parameters: 32.5 billion
  • Parameters (Non-Embedding): 31.0 billion

Guide: Running Locally

  1. Install Dependencies: Ensure you have the latest version of the Hugging Face Transformers library. Older versions may cause errors.
  2. Load Model and Tokenizer:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "Qwen/QwQ-32B-Preview"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Prepare Input and Generate Text:
    prompt = "Your prompt here."
    messages = [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    generated_ids = model.generate(**model_inputs, max_new_tokens=512)
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
  4. Suggested Environment: Use cloud GPUs for optimal performance, such as AWS, Google Cloud, or Azure.

License

The QwQ-32B-Preview model is licensed under the Apache-2.0 License. You can view the full license here.

More Related APIs in Text Generation