glm 4v 9b
THUDMIntroduction
GLM-4V-9B is the latest open-source, multimodal version of the GLM-4 series, developed by Zhipu AI. This model supports bilingual dialogue in both Chinese and English at high resolutions and showcases superior performance in various multimodal evaluations, surpassing models like GPT-4-turbo-2024-04-09 and Claude 3 Opus.
Architecture
GLM-4V-9B is a multimodal language model with visual understanding capabilities. It is designed to handle tasks such as perception reasoning, text recognition, and chart understanding, with a model context length of up to 8K.
Training
The model has been trained and evaluated on several benchmarks, demonstrating strong performance in both English and Chinese comprehensive tasks, perception reasoning, and text recognition. It supports high-resolution dialogues and has been tested against other leading models, showing competitive results.
Guide: Running Locally
To run the GLM-4V-9B model locally:
- Setup Environment: Ensure your environment meets the dependencies listed in the requirements.txt.
- Install Transformers: Make sure you have
transformers
version 4.44 or higher. - Load Model and Tokenizer:
import torch from PIL import Image from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4v-9b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "THUDM/glm-4v-9b", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True ).to(device).eval()
- Run Inference:
query = '描述这张图片' image = Image.open("your image").convert('RGB') inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device) gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1} with torch.no_grad(): outputs = model.generate(**inputs, **gen_kwargs) outputs = outputs[:, inputs['input_ids'].shape[1]:] print(tokenizer.decode(outputs[0]))
- Suggested Cloud GPUs: For optimal performance, consider using cloud services that offer powerful GPUs such as NVIDIA V100 or A100.
License
The use of GLM-4V-9B model weights is governed by its license agreement. Ensure compliance with these terms before use.