Biomed N L P K R I S S B E R T Pub Med U M L S E L

microsoft

Introduction

KRISSBERT is a contextual encoder model designed for biomedical entity linking using Knowledge-Rich Self-Supervision (KRISS). It addresses the challenges of entity linking, such as variations and ambiguities, without relying on extensive labeled data. The model leverages unlabeled text and domain knowledge, using PubMed abstracts and UMLS ontology for self-supervision.

Architecture

KRISSBERT builds upon the PubMedBERT model, which is initialized with parameters from PubMedBERT and continuously pretrained using biomedical entities from the UMLS ontology. This approach allows the model to achieve state-of-the-art results on biomedical entity linking tasks by effectively using context to disambiguate entity mentions.

Training

The training process of KRISSBERT involves the use of PubMed abstracts and the UMLS ontology for self-supervision. The model was evaluated on seven standard biomedical datasets, outperforming previous methods by up to 20 percentage points in accuracy, thanks to its ability to consider context in entity linking tasks.

Guide: Running Locally

  1. Create Conda Environment and Install Requirements

    conda create -n kriss -y python=3.8 && conda activate kriss
    pip install -r requirements.txt
    
  2. Switch to the Usage Directory

    cd usage
    
  3. Download the MedMentions Dataset

    git clone https://github.com/chanzuckerberg/MedMentions.git
    
  4. Generate Prototype Embeddings

    python generate_prototypes.py
    
  5. Run Entity Linking

    python run_entity_linking.py
    

    This setup achieves approximately 58.3% top-1 accuracy. For enhanced performance, consider utilizing cloud GPUs such as those offered by AWS, GCP, or Azure.

License

KRISSBERT is released under the MIT License, which permits use, distribution, and modification of the software.

More Related APIs in Feature Extraction