Biomed V L P Bio Vi L T LLM Model

Introduction

BioViL-T is a domain-specific vision-language model designed by Microsoft for analyzing chest X-rays (CXRs) and radiology reports. It employs a temporal multi-modal pre-training procedure, enhancing performance over its predecessor, BioViL. The model is capable of handling single- and multi-image applications, such as natural language inference and image/text classification.

Architecture

The architecture of BioViL-T includes a Vision Transformer and ResNet-50 for image encoding, integrated with a BERT language model. This setup allows the model to effectively aggregate and compare image features across temporal dimensions. The text and image models are trained jointly using a multi-modal contrastive learning framework.

Training

BioViL-T's training involves two stages. Initially, CXR-BERT-general is pretrained using Masked Language Modeling (MLM) on PubMed abstracts and clinical notes from MIMIC-III and MIMIC-CXR datasets. Subsequently, BioViL-T undergoes continual pretraining from CXR-BERT-general by using radiology reports and sequences of CXRs. The model aligns text and image embeddings using the latent representation of the [CLS] token.

Guide: Running Locally

Install Dependencies: Ensure you have Python and PyTorch installed. Install the necessary packages using pip:
```
pip install torch transformers
```

Load the Model: Use the following Python code to load the model and tokenizer:

import torch
from transformers import AutoModel, AutoTokenizer

url = "microsoft/BiomedVLP-BioViL-T"
tokenizer = AutoTokenizer.from_pretrained(url, trust_remote_code=True)
model = AutoModel.from_pretrained(url, trust_remote_code=True)

Prepare Inputs: Tokenize your text inputs and obtain sentence embeddings:

text_prompts = ["No pleural effusion or pneumothorax is seen.", ...]
tokenizer_output = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text_prompts, ...)
embeddings = model.get_projected_text_embeddings(...)

Compute Cosine Similarity: Calculate cosine similarity between embeddings:
```
sim = torch.mm(embeddings, embeddings.t())
```
Suggested Cloud GPUs: For better performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure.

License

BioViL-T is released under the MIT License, allowing freedom to use, modify, and distribute the software with minimal restrictions.

More Related APIs in Feature Extraction