Introduction

The TCR-BERT model is designed for tasks related to text classification and utilizes the BERT architecture. It is implemented in PyTorch and supports inference endpoints. The model is primarily focused on masked language modeling and classification across antigen labels.

Architecture

TCR-BERT is built on the BERT architecture and is optimized for tasks involving masked amino acid (MAA) modeling and antigen classification. It is suitable for applications in bioinformatics, particularly for understanding T-cell receptor (TCR) sequences.

Training

The model has been trained with a focus on two main tasks:

  • Masked Language Modeling (MAA Modeling): Predicting masked amino acids within a sequence.
  • Classification: Classifying sequences according to antigen labels from PIRD.

For details on training methodologies and datasets, refer to the full codebase and preprint paper.

Guide: Running Locally

To run TCR-BERT locally, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/wukevin/tcr-bert
    cd tcr-bert
    
  2. Install Dependencies: Ensure you have Python and PyTorch installed, then install the required packages:

    pip install -r requirements.txt
    
  3. Input Examples: Prepare input sequences such as:

    • C A S S P V T G G I Y G Y T F (binds to NLVPMVATV CMV antigen)
    • C A T S G R A G V E Q F F (binds to GILGFVFTL flu antigen)
  4. Run the Model: Execute the script to perform inference or training as needed.

For optimal performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure to handle computational requirements.

License

Please refer to the TCR-BERT GitHub repository for licensing details. Ensure compliance with the specified terms when using the model.

More Related APIs in Text Classification