chatglm3 6b 32k

THUDM

Introduction

ChatGLM3-6B-32K is an enhanced version of ChatGLM3-6B, designed to better understand and handle long texts up to 32K in length. It features updated position encoding and a specialized long-text training method. If your context length requirement is within 8K, ChatGLM3-6B is recommended, but for over 8K, ChatGLM3-6B-32K is preferred. ChatGLM3-6B retains the excellent features of previous models, with enhancements including a more powerful base model, comprehensive function support, and a broader open-source series.

Architecture

The architecture of ChatGLM3-6B-32K includes improvements in the base model, which uses a diverse training dataset and sufficient training steps, showing strong performance across various datasets. It also supports newly designed prompt formats, function calls, code execution, and agent tasks. The series includes open-source models for academic research and free commercial use after registration.

Training

ChatGLM3-6B-32K was trained with an updated position encoding and a long text training method, specifically designed for handling long dialogue contexts. Evaluations on various datasets demonstrate its superior performance among models with less than 10B parameters.

Guide: Running Locally

To run ChatGLM3-6B locally, follow these steps:

  1. Install dependencies:
    pip install protobuf transformers==4.30.2 cpm_kernels torch>=2.0 gradio mdtex2html sentencepiece accelerate
    
  2. Code usage example:
    from transformers import AutoTokenizer, AutoModel
    
    tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b-32k", trust_remote_code=True)
    model = AutoModel.from_pretrained("THUDM/chatglm3-6b-32k", trust_remote_code=True).half().cuda()
    model = model.eval()
    
    response, history = model.chat(tokenizer, "你好", history=[])
    print(response)
    
  3. Suggested hardware:
    • For optimal performance, using cloud GPUs such as NVIDIA Tesla V100 or A100 is recommended.

Additional instructions, including running CLI and web demos or using model quantization to save memory, are available in the Github Repo.

License

The code is open-sourced under the Apache-2.0 license. The usage of ChatGLM3-6B model weights must comply with the Model License.

More Related APIs