Qwen2 0.5 B Instruct

Qwen

Introduction

Qwen2 is a new series of large language models, featuring both base and instruction-tuned models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repository contains the instruction-tuned 0.5B Qwen2 model. It surpasses many open-source models and competes with proprietary models across various benchmarks for language understanding, generation, multilingual capability, coding, mathematics, and reasoning.

Architecture

The Qwen2 model series includes decoder language models of different sizes, consisting of base and aligned chat models. It is built on the Transformer architecture, utilizing SwiGLU activation, attention QKV bias, and group query attention. The tokenizer is improved for adaptability to multiple natural languages and codes.

Training

The models were pretrained with extensive data and underwent post-training through supervised fine-tuning and direct preference optimization.

Guide: Running Locally

  1. Install the required version of the transformers library:
    pip install transformers>=4.37.0
    
  2. Load the model and tokenizer in Python:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "cuda"  # Use a GPU if available
    
    model = AutoModelForCausalLM.from_pretrained(
        "Qwen/Qwen2-0.5B-Instruct",
        torch_dtype="auto",
        device_map="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
    
    prompt = "Give me a short introduction to large language model."
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    
    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=512
    )
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
  3. For optimal performance, it is recommended to use cloud GPUs, such as those provided by AWS, Google Cloud, or Azure.

License

The Qwen2 model is licensed under the Apache License 2.0.

More Related APIs in Text Generation