C O Re clinical diagnosis prediction

DATEXIS

Introduction

The CORe (Clinical Outcome Representations) model is designed for clinical diagnosis prediction using patient admission notes. Built on BioBERT, it is fine-tuned to predict multi-label ICD9 codes. The model is introduced in the paper "Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration."

Architecture

The model is based on BioBERT and further pre-trained on clinical notes, disease descriptions, and medical articles. It uses a specialized Clinical Outcome Pre-Training objective to enhance its predictive capabilities. The model can predict a total of 9237 labels, including 3- and 4-digit ICD9 codes and their textual descriptions.

Training

The model was fine-tuned specifically for diagnosis prediction, emphasizing the use of hierarchical and topical information. The training process incorporated 4-digit codes and textual descriptions but recommends using only 3-digit codes during inference as these were evaluated in the research.

Guide: Running Locally

To use the CORe model locally, follow these steps:

  1. Install Transformers Library:

    pip install transformers torch
    
  2. Load the Model:

    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    
    tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
    model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
    
  3. Perform Inference:

    input_text = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."
    
    tokenized_input = tokenizer(input_text, return_tensors="pt")
    output = model(**tokenized_input)
    
    import torch
    predictions = torch.sigmoid(output.logits)
    predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]
    

    Adjust thresholds per label for optimal results.

  4. Cloud GPUs: For enhanced performance, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure to run the model efficiently.

License

Detailed license information is not provided in the excerpt. For more details, please refer to the official Hugging Face model page or the repository.

More Related APIs in Text Classification