glm 4 9b
THUDMIntroduction
GLM-4-9B is an advanced open-source pre-trained model from the GLM-4 series developed by THUDM. It excels in several areas including semantics, mathematics, reasoning, and knowledge across diverse datasets. This model supports multi-round conversations and offers features like web browsing, code execution, and long-text reasoning. It supports 26 languages and offers a specialized version, GLM-4-9B-Chat, for extended functionalities.
Architecture
GLM-4-9B supports a maximum context length of 8K, while the GLM-4-9B-Chat-1M model supports up to 1M context length. Additionally, the GLM-4V-9B model enhances capabilities with bilingual multi-turn dialogue in high resolution. The architecture is designed for versatility across tasks such as perception, reasoning, and text recognition, surpassing other models like GPT-4-turbo in performance metrics.
Training
The model demonstrates superior performance in evaluations like MMLU, C-Eval, and HumanEval. It outperforms the Llama-3-8B model in various benchmarks, indicating its robust training process. The model is aligned with human preferences through extensive dataset evaluations.
Guide: Running Locally
- Environment Setup: Ensure you have
transformers>=4.44.0
installed. - Clone the Repository: Access the code from GitHub.
- Install Dependencies: Follow the instructions in the repository to install necessary dependencies.
- Run the Model: Use provided scripts to run the model locally for text generation tasks.
For enhanced performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The model weights are subject to the GLM-4 license. For full details, refer to the LICENSE file.