Biomed C L I P Pub Med B E R T_256 vit_base_patch16_224
microsoftIntroduction
BiomedCLIP is a biomedical vision-language model designed for various vision-language processing tasks. It's pre-trained on the PMC-15M dataset, consisting of 15 million figure-caption pairs from PubMed Central. The model uses PubMedBERT for text encoding and Vision Transformer for image encoding, with adaptations for the biomedical domain.
Architecture
BiomedCLIP employs a contrastive learning approach, utilizing PubMedBERT as its text encoder and a Vision Transformer as its image encoder. This combination allows the model to excel in tasks such as cross-modal retrieval, image classification, and visual question answering, outperforming previous vision-language processing (VLP) models in biomedical applications.
Training
The model was pretrained using a large dataset from PubMed Central, focusing on figures and their captions, which covers a wide array of biomedical imagery, including microscopy, radiography, and histology. The training strategy involved domain-specific adaptations to enhance performance in biomedical contexts.
Guide: Running Locally
Basic Steps
-
Set Up Environment
conda create -n biomedclip python=3.10 -y conda activate biomedclip pip install open_clip_torch==2.23.0 transformers==4.35.2 matplotlib
-
Load Model from Hugging Face Hub
import torch from urllib.request import urlopen from PIL import Image from open_clip import create_model_from_pretrained, get_tokenizer model, preprocess = create_model_from_pretrained('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224') tokenizer = get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
-
Perform Zero-Shot Image Classification
Load images and classify them using the model. Ensure you have a CUDA-capable GPU for optimal performance. -
Load Model from Local Files
Download model files and use the local path to initialize the model for offline usage.
Cloud GPUs
For enhanced performance, consider using cloud services like AWS EC2, Google Cloud, or Azure that provide access to powerful GPU instances.
License
The BiomedCLIP model is licensed under the MIT License, allowing for broad use and distribution.