Qwen 14 B Chat
QwenIntroduction
Qwen-14B-Chat is a 140-billion parameter large language model based on Transformers, developed by Alibaba Cloud. It is part of the Qwen series and is pretrained on diverse data, including web texts, books, and code. This repository focuses on Qwen-14B-Chat, an AI assistant trained with alignment techniques.
Architecture
The model architecture includes 40 layers and 40 attention heads, with a model dimension of 5120 and a vocabulary size of 151,851 tokens. It uses RoPE for position encoding, SwiGLU as the activation function, and RMSNorm for normalization. The tokenizer is optimized for Chinese, English, and multilingual data, using tiktoken for efficient tokenization.
Training
Qwen-14B-Chat is trained using diverse data sources, including web texts and professional books. The model supports various quantization techniques, such as Int4, which offers nearly lossless performance with reduced memory and improved inference speed.
Guide: Running Locally
-
Requirements:
- Python 3.8+
- PyTorch 1.12+, recommended 2.0+
- CUDA 11.4+ for GPU users
-
Install Dependencies:
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
-
Optional: Install Flash-Attention for efficiency:
git clone https://github.com/Dao-AILab/flash-attention cd flash-attention && pip install .
-
Run an Example:
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-14B-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-14B-Chat", device_map="auto", trust_remote_code=True).eval() response, history = model.chat(tokenizer, "给我讲一个故事", history=None) print(response)
-
Cloud GPUs: Consider using services like AWS, Google Cloud, or Azure for GPU support if local resources are insufficient.
License
The code and model weights are open for academic research and commercial purposes. Details can be found in the LICENSE. For commercial use, apply here.