esm2_t30_150 M_ U R50 D

facebook

Introduction

ESM-2 is a state-of-the-art protein model developed for masked language modeling. It is designed to be fine-tuned for various tasks that involve protein sequences. For comprehensive details, refer to the accompanying academic paper.

Architecture

ESM-2 is available in several checkpoints with varying numbers of layers and parameters:

  • esm2_t48_15B_UR50D: 48 layers, 15 billion parameters
  • esm2_t36_3B_UR50D: 36 layers, 3 billion parameters
  • esm2_t33_650M_UR50D: 33 layers, 650 million parameters
  • esm2_t30_150M_UR50D: 30 layers, 150 million parameters
  • esm2_t12_35M_UR50D: 12 layers, 35 million parameters
  • esm2_t6_8M_UR50D: 6 layers, 8 million parameters

Larger models generally offer better accuracy but demand more computational resources.

Training

The ESM-2 model is trained using a masked language modeling objective. This approach allows the model to predict missing parts of a protein sequence, making it highly suitable for fine-tuning on specific tasks related to protein data.

Guide: Running Locally

  1. Setup Environment:

    • Ensure you have Python installed.
    • Install required packages using pip:
      pip install torch transformers
      
  2. Download Model:

    • Use the Hugging Face model hub to download a specific ESM-2 checkpoint. For example:
      from transformers import AutoModel
      model = AutoModel.from_pretrained("facebook/esm2_t30_150M_UR50D")
      
  3. Run Model:

    • Load the model and input your protein sequences for processing.
  4. Suggested Cloud GPUs:

    • Consider using cloud services like AWS, Google Cloud, or Azure for accessing GPUs to handle larger models efficiently.

License

ESM-2 is licensed under the MIT License, allowing for wide usage and modification in both personal and commercial projects.

More Related APIs in Fill Mask