Biomed C L I P Pub Med B E R T_256 vit_base_patch16_224

microsoft

Introduction

BiomedCLIP is a biomedical vision-language model designed for various vision-language processing tasks. It's pre-trained on the PMC-15M dataset, consisting of 15 million figure-caption pairs from PubMed Central. The model uses PubMedBERT for text encoding and Vision Transformer for image encoding, with adaptations for the biomedical domain.

Architecture

BiomedCLIP employs a contrastive learning approach, utilizing PubMedBERT as its text encoder and a Vision Transformer as its image encoder. This combination allows the model to excel in tasks such as cross-modal retrieval, image classification, and visual question answering, outperforming previous vision-language processing (VLP) models in biomedical applications.

Training

The model was pretrained using a large dataset from PubMed Central, focusing on figures and their captions, which covers a wide array of biomedical imagery, including microscopy, radiography, and histology. The training strategy involved domain-specific adaptations to enhance performance in biomedical contexts.

Guide: Running Locally

Basic Steps

  1. Set Up Environment

    conda create -n biomedclip python=3.10 -y
    conda activate biomedclip
    pip install open_clip_torch==2.23.0 transformers==4.35.2 matplotlib
    
  2. Load Model from Hugging Face Hub

    import torch
    from urllib.request import urlopen
    from PIL import Image
    from open_clip import create_model_from_pretrained, get_tokenizer
    
    model, preprocess = create_model_from_pretrained('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
    tokenizer = get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
    
  3. Perform Zero-Shot Image Classification
    Load images and classify them using the model. Ensure you have a CUDA-capable GPU for optimal performance.

  4. Load Model from Local Files
    Download model files and use the local path to initialize the model for offline usage.

Cloud GPUs

For enhanced performance, consider using cloud services like AWS EC2, Google Cloud, or Azure that provide access to powerful GPU instances.

License

The BiomedCLIP model is licensed under the MIT License, allowing for broad use and distribution.

More Related APIs in Zero Shot Image Classification