protein ligand mlp 1

jglaser

Protein-Ligand-MLP-1 Model

Introduction

The Protein-Ligand-MLP-1 model is a sentence-transformers model that maps pairs of protein sequences and chemical sequences (canonical SMILES) to binding affinities, specifically pIC50 values. This model can estimate uncertainty by using different models trained with various seeds. Note that this model has been superseded by another model available on GitHub.

Architecture

The model architecture is based on the SentenceTransformer framework and consists of multiple layers:

  • Asym Layer: Processes protein and ligand sequences using BERT-based transformers followed by pooling and dense layers.
    • Protein:
      • Transformer with max sequence length of 2048
      • Pooling layer with word embedding dimension of 1024
      • Dense layer with Tanh activation
    • Ligand:
      • Transformer with max sequence length of 512
      • Pooling layer with word embedding dimension of 768
      • Dense layer with Tanh activation
  • Dense Layers:
    • Several dense layers with GELU activation functions, processing inputs from 1792 features down to 1 feature.

Training

Each model in the ensemble is trained using different seeds to allow the estimation of uncertainty in predictions. Training involves mapping sequence pairs to binding affinities.

Guide: Running Locally

To use the Protein-Ligand-MLP-1 model locally, follow these steps:

  1. Install Sentence-Transformers:
    pip install git+https://github.com/jglaser/sentence-transformers.git@enable_mixed
    
  2. Load and Run the Model:
    from sentence_transformers import SentenceTransformer
    
    sentences = [{'protein': ["SEQVENCE"], 'ligand': ["c1ccccc1"]}]
    model = SentenceTransformer('jglaser/protein-ligand-mlp-1')
    embeddings = model.encode(sentences)
    print(embeddings)
    
  3. Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for GPU resources to handle large datasets or complex computations efficiently.

License

Information regarding the licensing of this model is not explicitly mentioned in the provided text. It is advisable to check the model repository or contact the authors for detailed licensing information.

Citing & Authors

For academic use, please cite the bioRxiv preprint and acknowledge the contributors:

  • Andrew E Blanchard
  • John Gounley
  • Debsindhu Bhowmik
  • Mayanka Chandra Shekar
  • Isaac Lyngaas
  • Shang Gao
  • Junqi Yin
  • Aristeidis Tsaris
  • Feiyi Wang
  • Jens Glaser

More details can be found in the bioRxiv preprint.

More Related APIs in Sentence Similarity