drug molecule generation with V A E
keras-ioIntroduction
This repository contains a model and notebook for drug molecule generation using a Variational Autoencoder (VAE). The approach allows the generation of new molecules for drug discovery through a data-driven continuous representation. The model consists of an Encoder, Decoder, and Predictor to efficiently explore chemical compound spaces. This work is based on contributions by Victor Basu and reproduced by Vu Minh Chien.
Architecture
The VAE model architecture includes three primary components:
- Encoder: Converts discrete molecule representations into continuous vectors.
- Decoder: Transforms continuous vectors back into discrete molecule representations.
- Predictor: Estimates chemical properties from the latent continuous vector representation.
These continuous representations enable gradient-based optimization for exploring optimized chemical compounds.
Training
The model is trained using the ZINC database, a collection of commercially available compounds. Molecules are represented in SMILES format, which is a concise ASCII string representation. The dataset provides molecular properties such as logP, SAS, and QED, which are crucial for evaluating drug-likeness and accessibility.
Guide: Running Locally
- Environment Setup: Ensure you have Python installed along with
tensorflow
andkeras
. Install RDKit for SMILES transformation. - Clone Repository: Use Git to clone the repository to your local machine.
git clone https://github.com/keras-io/drug-molecule-generation-with-VAE.git
- Install Dependencies: Navigate to the repository directory and install dependencies.
cd drug-molecule-generation-with-VAE pip install -r requirements.txt
- Run Notebook: Open and run the Jupyter notebook provided in the repository to train the model and generate molecules.
jupyter notebook
For improved performance, consider using cloud GPU services from providers like AWS, Google Cloud, or Azure.
License
This project is open-source and available under the MIT License, allowing for extensive use, modification, and distribution.