Pathology B E R T

tsantos

Introduction

PathologyBERT is a pre-trained masked language model specifically developed for the pathology domain, particularly focusing on breast pathology specimens. It addresses the limitations of general-domain language models like BERT in handling domain-specific terminology by creating a specialized vocabulary.

Architecture

PathologyBERT is based on the BERT architecture, utilizing a masked language modeling approach to predict masked words in a sentence. It uses Word-Pieces for input tokenization but highlights issues with this method when dealing with specialized vocabulary in pathology.

Training

The model was pre-trained using a batch size of 32, a maximum sequence length of 64, a masked language model probability of 0.15, and a learning rate of 2e-5. Training was conducted for 300,000 steps, using BERT's default parameters.

Guide: Running Locally

To run PathologyBERT locally, you can use the Hugging Face Transformers library. Here's a basic guide:

  1. Install the Transformers library:

    pip install transformers
    
  2. Use the model with a pipeline for masked language modeling:

    from transformers import pipeline
    language_model = pipeline('fill-mask', model='tsantos/PathologyBERT')
    result = language_model("intraductal papilloma with [MASK] AND MICRO calcifications")
    
  3. Analyze the output to interpret the model's predictions.

For better performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure for computation-intensive tasks.

License

For licensing information, please refer to the original Hugging Face repository or contact the author via email at thiagogyn.maia@gmail.com.

More Related APIs in Fill Mask