Biomed V L P Bio Vi L T

microsoft

Introduction

BioViL-T is a domain-specific vision-language model designed by Microsoft for analyzing chest X-rays (CXRs) and radiology reports. It employs a temporal multi-modal pre-training procedure, enhancing performance over its predecessor, BioViL. The model is capable of handling single- and multi-image applications, such as natural language inference and image/text classification.

Architecture

The architecture of BioViL-T includes a Vision Transformer and ResNet-50 for image encoding, integrated with a BERT language model. This setup allows the model to effectively aggregate and compare image features across temporal dimensions. The text and image models are trained jointly using a multi-modal contrastive learning framework.

Training

BioViL-T's training involves two stages. Initially, CXR-BERT-general is pretrained using Masked Language Modeling (MLM) on PubMed abstracts and clinical notes from MIMIC-III and MIMIC-CXR datasets. Subsequently, BioViL-T undergoes continual pretraining from CXR-BERT-general by using radiology reports and sequences of CXRs. The model aligns text and image embeddings using the latent representation of the [CLS] token.

Guide: Running Locally

  1. Install Dependencies: Ensure you have Python and PyTorch installed. Install the necessary packages using pip:
    pip install torch transformers
    
  2. Load the Model: Use the following Python code to load the model and tokenizer:
    import torch
    from transformers import AutoModel, AutoTokenizer
    
    url = "microsoft/BiomedVLP-BioViL-T"
    tokenizer = AutoTokenizer.from_pretrained(url, trust_remote_code=True)
    model = AutoModel.from_pretrained(url, trust_remote_code=True)
    
  3. Prepare Inputs: Tokenize your text inputs and obtain sentence embeddings:
    text_prompts = ["No pleural effusion or pneumothorax is seen.", ...]
    tokenizer_output = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text_prompts, ...)
    embeddings = model.get_projected_text_embeddings(...)
    
  4. Compute Cosine Similarity: Calculate cosine similarity between embeddings:
    sim = torch.mm(embeddings, embeddings.t())
    
  5. Suggested Cloud GPUs: For better performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure.

License

BioViL-T is released under the MIT License, allowing freedom to use, modify, and distribute the software with minimal restrictions.

More Related APIs in Feature Extraction