Biomed V L P Bio Vi L T
microsoftIntroduction
BioViL-T is a domain-specific vision-language model designed by Microsoft for analyzing chest X-rays (CXRs) and radiology reports. It employs a temporal multi-modal pre-training procedure, enhancing performance over its predecessor, BioViL. The model is capable of handling single- and multi-image applications, such as natural language inference and image/text classification.
Architecture
The architecture of BioViL-T includes a Vision Transformer and ResNet-50 for image encoding, integrated with a BERT language model. This setup allows the model to effectively aggregate and compare image features across temporal dimensions. The text and image models are trained jointly using a multi-modal contrastive learning framework.
Training
BioViL-T's training involves two stages. Initially, CXR-BERT-general is pretrained using Masked Language Modeling (MLM) on PubMed abstracts and clinical notes from MIMIC-III and MIMIC-CXR datasets. Subsequently, BioViL-T undergoes continual pretraining from CXR-BERT-general by using radiology reports and sequences of CXRs. The model aligns text and image embeddings using the latent representation of the [CLS] token.
Guide: Running Locally
- Install Dependencies: Ensure you have Python and PyTorch installed. Install the necessary packages using pip:
pip install torch transformers
- Load the Model: Use the following Python code to load the model and tokenizer:
import torch from transformers import AutoModel, AutoTokenizer url = "microsoft/BiomedVLP-BioViL-T" tokenizer = AutoTokenizer.from_pretrained(url, trust_remote_code=True) model = AutoModel.from_pretrained(url, trust_remote_code=True)
- Prepare Inputs: Tokenize your text inputs and obtain sentence embeddings:
text_prompts = ["No pleural effusion or pneumothorax is seen.", ...] tokenizer_output = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text_prompts, ...) embeddings = model.get_projected_text_embeddings(...)
- Compute Cosine Similarity: Calculate cosine similarity between embeddings:
sim = torch.mm(embeddings, embeddings.t())
- Suggested Cloud GPUs: For better performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure.
License
BioViL-T is released under the MIT License, allowing freedom to use, modify, and distribute the software with minimal restrictions.