Biomed N L P K R I S S B E R T Pub Med U M L S E L
microsoftIntroduction
KRISSBERT is a contextual encoder model designed for biomedical entity linking using Knowledge-Rich Self-Supervision (KRISS). It addresses the challenges of entity linking, such as variations and ambiguities, without relying on extensive labeled data. The model leverages unlabeled text and domain knowledge, using PubMed abstracts and UMLS ontology for self-supervision.
Architecture
KRISSBERT builds upon the PubMedBERT model, which is initialized with parameters from PubMedBERT and continuously pretrained using biomedical entities from the UMLS ontology. This approach allows the model to achieve state-of-the-art results on biomedical entity linking tasks by effectively using context to disambiguate entity mentions.
Training
The training process of KRISSBERT involves the use of PubMed abstracts and the UMLS ontology for self-supervision. The model was evaluated on seven standard biomedical datasets, outperforming previous methods by up to 20 percentage points in accuracy, thanks to its ability to consider context in entity linking tasks.
Guide: Running Locally
-
Create Conda Environment and Install Requirements
conda create -n kriss -y python=3.8 && conda activate kriss pip install -r requirements.txt
-
Switch to the Usage Directory
cd usage
-
Download the MedMentions Dataset
git clone https://github.com/chanzuckerberg/MedMentions.git
-
Generate Prototype Embeddings
python generate_prototypes.py
-
Run Entity Linking
python run_entity_linking.py
This setup achieves approximately 58.3% top-1 accuracy. For enhanced performance, consider utilizing cloud GPUs such as those offered by AWS, GCP, or Azure.
License
KRISSBERT is released under the MIT License, which permits use, distribution, and modification of the software.