esm2_t30_150 M_ U R50 D
facebookIntroduction
ESM-2 is a state-of-the-art protein model developed for masked language modeling. It is designed to be fine-tuned for various tasks that involve protein sequences. For comprehensive details, refer to the accompanying academic paper.
Architecture
ESM-2 is available in several checkpoints with varying numbers of layers and parameters:
- esm2_t48_15B_UR50D: 48 layers, 15 billion parameters
- esm2_t36_3B_UR50D: 36 layers, 3 billion parameters
- esm2_t33_650M_UR50D: 33 layers, 650 million parameters
- esm2_t30_150M_UR50D: 30 layers, 150 million parameters
- esm2_t12_35M_UR50D: 12 layers, 35 million parameters
- esm2_t6_8M_UR50D: 6 layers, 8 million parameters
Larger models generally offer better accuracy but demand more computational resources.
Training
The ESM-2 model is trained using a masked language modeling objective. This approach allows the model to predict missing parts of a protein sequence, making it highly suitable for fine-tuning on specific tasks related to protein data.
Guide: Running Locally
-
Setup Environment:
- Ensure you have Python installed.
- Install required packages using pip:
pip install torch transformers
-
Download Model:
- Use the Hugging Face model hub to download a specific ESM-2 checkpoint. For example:
from transformers import AutoModel model = AutoModel.from_pretrained("facebook/esm2_t30_150M_UR50D")
- Use the Hugging Face model hub to download a specific ESM-2 checkpoint. For example:
-
Run Model:
- Load the model and input your protein sequences for processing.
-
Suggested Cloud GPUs:
- Consider using cloud services like AWS, Google Cloud, or Azure for accessing GPUs to handle larger models efficiently.
License
ESM-2 is licensed under the MIT License, allowing for wide usage and modification in both personal and commercial projects.