Qwen 14 B Chat LLM Model — Open LLM List

Introduction

Qwen-14B-Chat is a 140-billion parameter large language model based on Transformers, developed by Alibaba Cloud. It is part of the Qwen series and is pretrained on diverse data, including web texts, books, and code. This repository focuses on Qwen-14B-Chat, an AI assistant trained with alignment techniques.

Architecture

The model architecture includes 40 layers and 40 attention heads, with a model dimension of 5120 and a vocabulary size of 151,851 tokens. It uses RoPE for position encoding, SwiGLU as the activation function, and RMSNorm for normalization. The tokenizer is optimized for Chinese, English, and multilingual data, using tiktoken for efficient tokenization.

Training

Qwen-14B-Chat is trained using diverse data sources, including web texts and professional books. The model supports various quantization techniques, such as Int4, which offers nearly lossless performance with reduced memory and improved inference speed.

Guide: Running Locally

Requirements:
- Python 3.8+
- PyTorch 1.12+, recommended 2.0+
- CUDA 11.4+ for GPU users

Install Dependencies:

pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed

Optional: Install Flash-Attention for efficiency:

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .

Run an Example:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-14B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-14B-Chat", device_map="auto", trust_remote_code=True).eval()

response, history = model.chat(tokenizer, "给我讲一个故事", history=None)
print(response)

Cloud GPUs: Consider using services like AWS, Google Cloud, or Azure for GPU support if local resources are insufficient.

License

The code and model weights are open for academic research and commercial purposes. Details can be found in the LICENSE. For commercial use, apply here.

More Related APIs in Text Generation