biomed.sm.mv te 84m Molecule Net ligand_scaffold S I D E R 101
ibmIntroduction
BIOMED.SM.MV-TE-84M-MOLECULENET-LIGAND_SCAFFOLD-SIDER-101 is a multimodal biomedical foundation model developed by IBM Research. It utilizes a multi-view approach to predict molecular properties and interactions, particularly focusing on small molecules. The model leverages image, graph, and text representations to improve performance across various predictive tasks such as ligand-protein binding, solubility, metabolism, and toxicity.
Architecture
The model architecture combines three distinct molecular representations:
- Image Representation: Captures 2D visual features using RDKit, enhanced with data augmentation.
- Graph Representation: Encodes molecules as graphs with nodes and edges representing atoms and bonds, respectively.
- Text Representation: Utilizes SMILES strings processed through a transformer-based tokenizer.
These representations are integrated using an attention-based aggregator to form a unified multi-view embedding, enhancing the model's ability to perform robustly across diverse tasks.
Training
The model is pre-trained and can be fine-tuned for specific tasks such as regression and classification in molecular property prediction. Pre-trained embeddings can serve as a basis for similarity measures in chemical libraries. The model is also capable of integrating with protein embeddings for combined small molecule and protein representation tasks.
Guide: Running Locally
Prerequisites
- Operating System: Linux or macOS
- Python Version: Python 3.11
- Conda: Anaconda or Miniconda
- Git: Version control for cloning repositories
Steps
-
Set Up Project Directory:
export ROOT_DIR=~/biomed-multiview mkdir -p $ROOT_DIR
-
Create and Activate Conda Environment:
conda create -y python=3.11 --prefix $ROOT_DIR/envs/biomed-multiview conda activate $ROOT_DIR/envs/biomed-multiview
-
Clone the Repository:
mkdir -p $ROOT_DIR/code cd $ROOT_DIR/code git clone https://github.com/BiomedSciAI/biomed-multi-view.git cd biomed-multi-view
-
Install Dependencies:
pip install -e .['dev'] pip install -r requirements.txt
-
macOS-Specific Instructions (Apple Silicon):
noglob pip install -e .[dev] pip install -r requirements-mps.txt
-
Installation Verification (Optional):
python -m unittest bmfm_sm.tests.all_tests
Cloud GPUs
For enhanced performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
This model is licensed under the Apache 2.0 License. More details can be found at Apache 2.0 License.