Biomed C L I P Pub Med B E R T_256 vit_base_patch16_224 LLM Model

Introduction

BiomedCLIP is a biomedical vision-language model designed for various vision-language processing tasks. It's pre-trained on the PMC-15M dataset, consisting of 15 million figure-caption pairs from PubMed Central. The model uses PubMedBERT for text encoding and Vision Transformer for image encoding, with adaptations for the biomedical domain.

Architecture

BiomedCLIP employs a contrastive learning approach, utilizing PubMedBERT as its text encoder and a Vision Transformer as its image encoder. This combination allows the model to excel in tasks such as cross-modal retrieval, image classification, and visual question answering, outperforming previous vision-language processing (VLP) models in biomedical applications.

Training

The model was pretrained using a large dataset from PubMed Central, focusing on figures and their captions, which covers a wide array of biomedical imagery, including microscopy, radiography, and histology. The training strategy involved domain-specific adaptations to enhance performance in biomedical contexts.

Guide: Running Locally

Basic Steps

Set Up Environment

conda create -n biomedclip python=3.10 -y
conda activate biomedclip
pip install open_clip_torch==2.23.0 transformers==4.35.2 matplotlib

Load Model from Hugging Face Hub

import torch
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer

model, preprocess = create_model_from_pretrained('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
tokenizer = get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')

Perform Zero-Shot Image Classification
Load images and classify them using the model. Ensure you have a CUDA-capable GPU for optimal performance.
Load Model from Local Files
Download model files and use the local path to initialize the model for offline usage.

Cloud GPUs

For enhanced performance, consider using cloud services like AWS EC2, Google Cloud, or Azure that provide access to powerful GPU instances.

License

The BiomedCLIP model is licensed under the MIT License, allowing for broad use and distribution.

More Related APIs in Zero Shot Image Classification