C O B R A
KatherLabIntroduction
COBRA (Contrastive Biomarker Representation Alignment) is a novel methodology for representation learning on pathology whole-slide images (WSIs). It leverages self-supervised learning (SSL) to generate slide-level embeddings, extending the principles of SSL from individual patches to entire slides. COBRA integrates tile embeddings from multiple foundation models (FMs) and employs a contrastive pretraining strategy, outperforming state-of-the-art slide encoders.
Architecture
COBRA is based on the Mamba-2 architecture and utilizes multiple FMs to improve representation learning. The method focuses on aligning slide augmentations or utilizing multimodal data, enabling compatibility with unseen feature extractors during inference. The approach is validated on multiple Clinical Proteomic Tumor Analysis Consortium (CPTAC) cohorts, showing significant improvements in performance metrics.
Training
COBRA is pretrained on 3,048 WSIs from The Cancer Genome Atlas (TCGA). The training employs a contrastive pretraining strategy, allowing the model to learn robust slide representations that generalize well across different datasets and tasks.
Guide: Running Locally
To run COBRA locally, follow these steps:
-
Clone the Repository and Install Dependencies:
git clone https://github.com/KatherLab/COBRA.git && cd COBRA pip install uv uv venv --python=3.11 source .venv/bin/activate uv pip install "torch==2.4.1" setuptools packaging wheel "numpy==2.0.0" uv sync --no-build-isolation
-
Prepare Data:
- Extract tile embeddings using your preferred patch encoders with STAMP.
-
Deploy COBRA:
- Extract slide-level embeddings:
python -m cobra.inference.extract_feats --feat_dir <tile_emb_dir> --output_dir <slide_emb_dir>
- Extract slide-level embeddings:
For improved performance, consider using cloud GPUs such as those available on AWS, Google Cloud, or Azure to handle large-scale data processing and model inference.
License
COBRA is distributed under the GPL-3.0 License. Users must agree not to conduct experiments that harm human subjects and to use the model for non-commercial purposes only.