Huatuo G P T Vision 7 B hf

FreedomIntelligence

Introduction

HuatuoGPT-Vision-7B-HF is a Hugging Face implementation of the HuatuoGPT-Vision-7B model. It is compatible with VLLM and other frameworks. The model is designed for advanced text generation tasks, integrating image and text processing capabilities.

Architecture

The model supports multilingual capabilities in English and Chinese, focusing on vision and image-text-to-text tasks. It utilizes the FreedomIntelligence/PubMedVision dataset for training and deployment, and is tagged under text-generation and vision categories.

Training

HuatuoGPT-Vision-7B-HF is based on a robust dataset, PubMedVision, to enhance its understanding of medical visual knowledge. This training allows it to perform complex multimodal tasks by integrating image and text data.

Guide: Running Locally

Steps

  1. Deploy the Model
    Use VLLM to deploy the model:

    python -m vllm.entrypoints.openai.api_server \
    --model huatuogpt_vision_model_path  \
    --tensor_parallel_size 1 \
    --gpu_memory_utilization 0.8 \
    --served-model-name huatuogpt_vision_7b \
    --port 9559 --max-model-len 2048 > vllm_openai_server.log 2>&1 &
    
  2. Model Inference
    Use the following Python code for inference:

    from openai import OpenAI
    from PIL import Image
    import base64
    import io
    
    def get_image(image_path):
        image = Image.open(image_path).convert('RGB')
        img_type = image.format
        if not img_type:
            img_type = image_path.split('.')[-1]
        byte_arr = io.BytesIO()
        image.save(byte_arr, format=img_type)
        byte_arr.seek(0)
        image = base64.b64encode(byte_arr.getvalue()).decode()
        return image, img_type
    
    client = OpenAI(base_url="http://localhost:9559/v1", api_key="token-abc123")
    image_path = 'your_image_path'
    image, img_type = get_image(image_path)
    
    inputcontent = [{
        "type": "text",
        "text": '<image>\nWhat does the picture show?'
    }]
    
    inputcontent.append({
        "type": "image_url",
        "image_url": {
            "url": f"data:image/{img_type};base64,{image}"
        }
    })
    
    response = client.chat.completions.create(
        model="huatuogpt_vision_7b",
        messages=[
            {"role": "user", "content": inputcontent}
        ],
        temperature=0.2
    )
    print(response.choices[0].message.content)
    

Cloud GPUs

For optimal performance, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure for deployment and inference tasks.

License

HuatuoGPT-Vision-7B-HF is licensed under the Apache-2.0 license, allowing for wide usage and distribution with minimal restrictions.

More Related APIs in Text Generation