materials.mhg ged

ibm

Introduction

We present MHG-GNN, an autoencoder architecture featuring an encoder based on Graph Neural Networks (GNN) and a decoder utilizing a sequential model with Molecular Hypergraph Grammar (MHG). This design enables MHG-GNN to accept any molecular input and offers high predictive performance on molecular graph data. The decoder ensures the generation of structurally valid molecules.

Architecture

MHG-GNN consists of two main components:

  • Encoder: Utilizes a variant of GNN to process molecular graph data.
  • Decoder: Based on MHG, it ensures the output is always a structurally valid molecule.

Training

Pre-trained models of MHG-GNN are available, trained on a dataset of approximately 1.34 million molecules from PubChem. The training environment has been tested on Intel E5-2667 CPUs and NVIDIA A100 Tensor Core GPUs.

Guide: Running Locally

  1. Installation:

    • Create and activate a virtual environment:
      python3 -m venv .venv
      . .venv/bin/activate
      
    • Clone the repository and install dependencies:
      git clone git@github.ibm.com:CMD-TRL/mhg-gnn.git
      cd ./mhg-gnn
      pip install .
      
  2. Feature Extraction:

    • Use the example notebook mhg-gnn_encoder_decoder_example.ipynb for loading checkpoints and using the model.
    • Load the model with:
      import torch
      import load
      
      model = load.load()
      
    • Encode SMILES strings into embeddings:
      with torch.no_grad():
          repr = model.encode(["CCO", "O=C=O", "OC(=O)c1ccccc1C(=O)O"])
      
    • Decode embeddings back into SMILES strings:
      orig = model.decode(repr)
      
  3. Suggested Cloud GPUs:

    • Consider using NVIDIA A100 Tensor Core GPUs for optimal performance during training and inference.

License

This project is licensed under the Apache 2.0 License.

More Related APIs in Feature Extraction