Protein-Ligand-MLP-1 Model

Introduction

The Protein-Ligand-MLP-1 model is a sentence-transformers model that maps pairs of protein sequences and chemical sequences (canonical SMILES) to binding affinities, specifically pIC50 values. This model can estimate uncertainty by using different models trained with various seeds. Note that this model has been superseded by another model available on GitHub.

Architecture

The model architecture is based on the SentenceTransformer framework and consists of multiple layers:

Asym Layer: Processes protein and ligand sequences using BERT-based transformers followed by pooling and dense layers.
- Protein:
  - Transformer with max sequence length of 2048
  - Pooling layer with word embedding dimension of 1024
  - Dense layer with Tanh activation
- Ligand:
  - Transformer with max sequence length of 512
  - Pooling layer with word embedding dimension of 768
  - Dense layer with Tanh activation
Dense Layers:
- Several dense layers with GELU activation functions, processing inputs from 1792 features down to 1 feature.

Training

Each model in the ensemble is trained using different seeds to allow the estimation of uncertainty in predictions. Training involves mapping sequence pairs to binding affinities.

Guide: Running Locally

To use the Protein-Ligand-MLP-1 model locally, follow these steps:

Install Sentence-Transformers:

pip install git+https://github.com/jglaser/sentence-transformers.git@enable_mixed

Load and Run the Model:

from sentence_transformers import SentenceTransformer

sentences = [{'protein': ["SEQVENCE"], 'ligand': ["c1ccccc1"]}]
model = SentenceTransformer('jglaser/protein-ligand-mlp-1')
embeddings = model.encode(sentences)
print(embeddings)

Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for GPU resources to handle large datasets or complex computations efficiently.

License

Information regarding the licensing of this model is not explicitly mentioned in the provided text. It is advisable to check the model repository or contact the authors for detailed licensing information.

Citing & Authors

For academic use, please cite the bioRxiv preprint and acknowledge the contributors:

Andrew E Blanchard
John Gounley
Debsindhu Bhowmik
Mayanka Chandra Shekar
Isaac Lyngaas
Shang Gao
Junqi Yin
Aristeidis Tsaris
Feiyi Wang
Jens Glaser

More details can be found in the bioRxiv preprint.

More Related APIs in Sentence Similarity