Huatuo G P T Vision 7 B hf
FreedomIntelligenceIntroduction
HuatuoGPT-Vision-7B-HF is a Hugging Face implementation of the HuatuoGPT-Vision-7B model. It is compatible with VLLM and other frameworks. The model is designed for advanced text generation tasks, integrating image and text processing capabilities.
Architecture
The model supports multilingual capabilities in English and Chinese, focusing on vision and image-text-to-text tasks. It utilizes the FreedomIntelligence/PubMedVision dataset for training and deployment, and is tagged under text-generation and vision categories.
Training
HuatuoGPT-Vision-7B-HF is based on a robust dataset, PubMedVision, to enhance its understanding of medical visual knowledge. This training allows it to perform complex multimodal tasks by integrating image and text data.
Guide: Running Locally
Steps
-
Deploy the Model
Use VLLM to deploy the model:python -m vllm.entrypoints.openai.api_server \ --model huatuogpt_vision_model_path \ --tensor_parallel_size 1 \ --gpu_memory_utilization 0.8 \ --served-model-name huatuogpt_vision_7b \ --port 9559 --max-model-len 2048 > vllm_openai_server.log 2>&1 &
-
Model Inference
Use the following Python code for inference:from openai import OpenAI from PIL import Image import base64 import io def get_image(image_path): image = Image.open(image_path).convert('RGB') img_type = image.format if not img_type: img_type = image_path.split('.')[-1] byte_arr = io.BytesIO() image.save(byte_arr, format=img_type) byte_arr.seek(0) image = base64.b64encode(byte_arr.getvalue()).decode() return image, img_type client = OpenAI(base_url="http://localhost:9559/v1", api_key="token-abc123") image_path = 'your_image_path' image, img_type = get_image(image_path) inputcontent = [{ "type": "text", "text": '<image>\nWhat does the picture show?' }] inputcontent.append({ "type": "image_url", "image_url": { "url": f"data:image/{img_type};base64,{image}" } }) response = client.chat.completions.create( model="huatuogpt_vision_7b", messages=[ {"role": "user", "content": inputcontent} ], temperature=0.2 ) print(response.choices[0].message.content)
Cloud GPUs
For optimal performance, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure for deployment and inference tasks.
License
HuatuoGPT-Vision-7B-HF is licensed under the Apache-2.0 license, allowing for wide usage and distribution with minimal restrictions.