en_core_med7_lg

kormilitzin

Introduction

The en_core_med7_lg model is a token classification model designed for Named Entity Recognition (NER) in clinical text. It is built using the spaCy library and focuses on extracting medical entities such as drug names, dosages, and treatment routes from electronic health records.

Architecture

  • Model Name: en_core_med7_lg
  • Version: 3.4.2.1
  • Library: spaCy (version >=3.4.2,<3.5.0)
  • Pipeline Components: tok2vec, ner
  • Vector Details: 514,157 keys, 514,157 unique vectors, each with 300 dimensions
  • Label Scheme: The model identifies seven labels: DOSAGE, DRUG, DURATION, FORM, FREQUENCY, ROUTE, STRENGTH.

Training

The model is trained to perform NER on clinical texts with high precision, recall, and F-score. The performance metrics are as follows:

  • NER Precision: 0.8649
  • NER Recall: 0.8893
  • NER F Score: 0.8770
  • TOK2VEC Loss: 226,109.53
  • NER Loss: 302,222.55

Guide: Running Locally

To run the en_core_med7_lg model locally, follow these steps:

  1. Install spaCy: Ensure you have spaCy installed with a compatible version:

    pip install spacy>=3.4.2,<3.5.0
    
  2. Download the model: You can download and install the model using spaCy's CLI:

    python -m spacy download en_core_med7_lg
    
  3. Load and use the model in your script:

    import spacy
    nlp = spacy.load("en_core_med7_lg")
    doc = nlp("The patient was prescribed 100mg of Aspirin daily for two weeks.")
    for ent in doc.ents:
        print(ent.text, ent.label_)
    

For enhanced performance, consider running the model on a cloud GPU service such as AWS, Google Cloud, or Azure.

License

The en_core_med7_lg model is released under the MIT License, allowing for broad use and modification.

More Related APIs in Token Classification