biomed.sm.mv te 84m Molecule Net ligand_scaffold M U V 101

ibm

Introduction

The IBM BIOMED.SM.MV-TE-84M-MOLECULENET-LIGAND_SCAFFOLD-MUV-101 model is a multimodal biomedical foundation model designed for small molecule analysis. It utilizes the MMELON (Multi-view Molecular Embedding with Late Fusion) approach to integrate multiple molecular representations (sequence, image, graph) for robust predictive performance across various tasks, such as ligand-protein binding and molecular property prediction.

Architecture

The model employs a multi-view strategy combining image, graph, and text representations:

  • Image Representation: Captures 2D molecular structure using RDKit and data augmentation.
  • Graph Representation: Encodes molecules as graphs with atoms as nodes and bonds as edges, using categorical embedding for properties.
  • Text Representation: Uses SMILES strings tokenized with a custom tokenizer and embedded through a transformer architecture.

These embeddings are aggregated through an attention-based module to form a unified multi-view embedding, boosting performance on predictive tasks.

Training

The model can be fine-tuned for various tasks, including regression and classification, such as binding affinity and toxicity prediction. It supports using pre-trained embeddings for chemical similarity measures and can be combined with protein embeddings for specific tasks.

Guide: Running Locally

Prerequisites

  • Operating System: Linux or macOS
  • Python Version: Python 3.11
  • Conda: Anaconda or Miniconda installed
  • Git: For cloning the repository

Installation Steps

  1. Set Up the Project Directory

    export ROOT_DIR=~/biomed-multiview
    mkdir -p $ROOT_DIR
    
  2. Create and Activate a Conda Environment

    conda create -y python=3.11 --prefix $ROOT_DIR/envs/biomed-multiview
    conda activate $ROOT_DIR/envs/biomed-multiview
    
  3. Clone the Repository

    mkdir -p $ROOT_DIR/code
    cd $ROOT_DIR/code
    git clone https://github.com/BiomedSciAI/biomed-multi-view.git
    cd biomed-multi-view
    
  4. Install Package Dependencies

    pip install -e .['dev']
    pip install -r requirements.txt
    
  5. macOS-Specific Instructions

    For Apple Silicon, disable globbing and install macOS-specific packages:

    noglob pip install -e .[dev]
    pip install -r requirements-mps.txt
    
  6. Installation Verification (Optional)

    python -m unittest bmfm_sm.tests.all_tests
    

Cloud GPUs

For improved performance, consider utilizing cloud GPUs from providers such as AWS, GCP, or Azure.

License

The model is distributed under the Apache 2.0 License. For more details, refer to Apache License 2.0.

More Related APIs