materials.mhg ged
ibmIntroduction
We present MHG-GNN, an autoencoder architecture featuring an encoder based on Graph Neural Networks (GNN) and a decoder utilizing a sequential model with Molecular Hypergraph Grammar (MHG). This design enables MHG-GNN to accept any molecular input and offers high predictive performance on molecular graph data. The decoder ensures the generation of structurally valid molecules.
Architecture
MHG-GNN consists of two main components:
- Encoder: Utilizes a variant of GNN to process molecular graph data.
- Decoder: Based on MHG, it ensures the output is always a structurally valid molecule.
Training
Pre-trained models of MHG-GNN are available, trained on a dataset of approximately 1.34 million molecules from PubChem. The training environment has been tested on Intel E5-2667 CPUs and NVIDIA A100 Tensor Core GPUs.
Guide: Running Locally
-
Installation:
- Create and activate a virtual environment:
python3 -m venv .venv . .venv/bin/activate
- Clone the repository and install dependencies:
git clone git@github.ibm.com:CMD-TRL/mhg-gnn.git cd ./mhg-gnn pip install .
- Create and activate a virtual environment:
-
Feature Extraction:
- Use the example notebook
mhg-gnn_encoder_decoder_example.ipynb
for loading checkpoints and using the model. - Load the model with:
import torch import load model = load.load()
- Encode SMILES strings into embeddings:
with torch.no_grad(): repr = model.encode(["CCO", "O=C=O", "OC(=O)c1ccccc1C(=O)O"])
- Decode embeddings back into SMILES strings:
orig = model.decode(repr)
- Use the example notebook
-
Suggested Cloud GPUs:
- Consider using NVIDIA A100 Tensor Core GPUs for optimal performance during training and inference.
License
This project is licensed under the Apache 2.0 License.