en_core_med7_lg
kormilitzinIntroduction
The en_core_med7_lg
model is a token classification model designed for Named Entity Recognition (NER) in clinical text. It is built using the spaCy library and focuses on extracting medical entities such as drug names, dosages, and treatment routes from electronic health records.
Architecture
- Model Name: en_core_med7_lg
- Version: 3.4.2.1
- Library: spaCy (version >=3.4.2,<3.5.0)
- Pipeline Components:
tok2vec
,ner
- Vector Details: 514,157 keys, 514,157 unique vectors, each with 300 dimensions
- Label Scheme: The model identifies seven labels: DOSAGE, DRUG, DURATION, FORM, FREQUENCY, ROUTE, STRENGTH.
Training
The model is trained to perform NER on clinical texts with high precision, recall, and F-score. The performance metrics are as follows:
- NER Precision: 0.8649
- NER Recall: 0.8893
- NER F Score: 0.8770
- TOK2VEC Loss: 226,109.53
- NER Loss: 302,222.55
Guide: Running Locally
To run the en_core_med7_lg
model locally, follow these steps:
-
Install spaCy: Ensure you have spaCy installed with a compatible version:
pip install spacy>=3.4.2,<3.5.0
-
Download the model: You can download and install the model using spaCy's CLI:
python -m spacy download en_core_med7_lg
-
Load and use the model in your script:
import spacy nlp = spacy.load("en_core_med7_lg") doc = nlp("The patient was prescribed 100mg of Aspirin daily for two weeks.") for ent in doc.ents: print(ent.text, ent.label_)
For enhanced performance, consider running the model on a cloud GPU service such as AWS, Google Cloud, or Azure.
License
The en_core_med7_lg
model is released under the MIT License, allowing for broad use and modification.