Introduction

MAIRA-2 is a multimodal transformer developed by Microsoft Research Health Futures for generating grounded or non-grounded radiology reports from chest X-rays. The model, designed solely for research purposes, aims to facilitate comparison and further research in the field of radiology report generation.

Architecture

MAIRA-2 comprises three main components:

  • RAD-DINO-MAIRA-2: A frozen image encoder.
  • Projection Layer: Trained from scratch.
  • Vicuna-7b-v1.5: A language model that is fully fine-tuned.

The model processes inputs such as frontal chest X-rays and optionally a lateral view, prior report, and additional study indications to generate the findings section of a radiology report.

Training

MAIRA-2 was trained using a mix of public and private chest X-ray datasets, including MIMIC-CXR, PadChest, and a private USMix dataset. Training involved generating findings sections from reports, either with or without grounding (spatial annotations). The model was trained using NVIDIA A100 GPUs over 1432 hours on Azure, emitting 107.4 CO₂ eq.

Guide: Running Locally

To run MAIRA-2 locally, follow these steps:

  1. Install Required Packages:

    pip install pillow protobuf sentencepiece torch transformers
    
  2. Clone and Install Transformers: Due to a specific commit requirement, install transformers from source:

    pip install git+https://github.com/huggingface/transformers.git@88d960937c81a32bfb63356a2e8ecf7999619681
    
  3. Initialize the Model:

    from transformers import AutoModelForCausalLM, AutoProcessor
    import torch
    
    model = AutoModelForCausalLM.from_pretrained("microsoft/maira-2", trust_remote_code=True)
    processor = AutoProcessor.from_pretrained("microsoft/maira-2", trust_remote_code=True)
    
    device = torch.device("cuda")
    model = model.eval().to(device)
    
  4. Run Inference: Use sample data to test the model's capabilities, such as findings generation and phrase grounding.

Suggested cloud GPUs: NVIDIA A100 on Azure for optimal performance.

License

The model is licensed under the MSRLA license, which restricts its use to research and development only. It is not intended for clinical decision-making or other clinical uses. Full license details can be found here.

More Related APIs in Text Generation