biomed.sm.mv te 84m Molecule Net ligand_scaffold M U V 101
ibmIntroduction
The IBM BIOMED.SM.MV-TE-84M-MOLECULENET-LIGAND_SCAFFOLD-MUV-101 model is a multimodal biomedical foundation model designed for small molecule analysis. It utilizes the MMELON (Multi-view Molecular Embedding with Late Fusion) approach to integrate multiple molecular representations (sequence, image, graph) for robust predictive performance across various tasks, such as ligand-protein binding and molecular property prediction.
Architecture
The model employs a multi-view strategy combining image, graph, and text representations:
- Image Representation: Captures 2D molecular structure using RDKit and data augmentation.
- Graph Representation: Encodes molecules as graphs with atoms as nodes and bonds as edges, using categorical embedding for properties.
- Text Representation: Uses SMILES strings tokenized with a custom tokenizer and embedded through a transformer architecture.
These embeddings are aggregated through an attention-based module to form a unified multi-view embedding, boosting performance on predictive tasks.
Training
The model can be fine-tuned for various tasks, including regression and classification, such as binding affinity and toxicity prediction. It supports using pre-trained embeddings for chemical similarity measures and can be combined with protein embeddings for specific tasks.
Guide: Running Locally
Prerequisites
- Operating System: Linux or macOS
- Python Version: Python 3.11
- Conda: Anaconda or Miniconda installed
- Git: For cloning the repository
Installation Steps
-
Set Up the Project Directory
export ROOT_DIR=~/biomed-multiview mkdir -p $ROOT_DIR
-
Create and Activate a Conda Environment
conda create -y python=3.11 --prefix $ROOT_DIR/envs/biomed-multiview conda activate $ROOT_DIR/envs/biomed-multiview
-
Clone the Repository
mkdir -p $ROOT_DIR/code cd $ROOT_DIR/code git clone https://github.com/BiomedSciAI/biomed-multi-view.git cd biomed-multi-view
-
Install Package Dependencies
pip install -e .['dev'] pip install -r requirements.txt
-
macOS-Specific Instructions
For Apple Silicon, disable globbing and install macOS-specific packages:
noglob pip install -e .[dev] pip install -r requirements-mps.txt
-
Installation Verification (Optional)
python -m unittest bmfm_sm.tests.all_tests
Cloud GPUs
For improved performance, consider utilizing cloud GPUs from providers such as AWS, GCP, or Azure.
License
The model is distributed under the Apache 2.0 License. For more details, refer to Apache License 2.0.