QWEN2-0.5B

Introduction

Qwen2 is a series of large language models, offering a range of models from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repository hosts the 0.5B Qwen2 base language model, which surpasses many open-source models and competes with proprietary models in various benchmarks, including language understanding, generation, multilingual capability, coding, and reasoning.

Architecture

The Qwen2 series consists of decoder language models with varying sizes, including base and chat models. It is built on the Transformer architecture using SwiGLU activation, attention QKV bias, and group query attention. The tokenizer is improved for adaptability to multiple natural languages and codes.

Training

The Qwen2 models are integrated into the latest version of Hugging Face Transformers, requiring version 4.37.0 or later to avoid errors. The base models are not recommended for text generation without post-training using techniques like SFT, RLHF, or continued pretraining.

Guide: Running Locally

Install Dependencies: Ensure you have Python and install the latest version of Hugging Face Transformers.
```
pip install transformers>=4.37.0
```

Download the Model: Use the Transformers library to load the Qwen2-0.5B model.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B")

Inference: Use the model to generate text.

inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

For better performance, consider using cloud GPU providers like AWS, Google Cloud, or Azure.

License

The Qwen2-0.5B model is licensed under the Apache-2.0 License.