biomedical ner all
d4dataIntroduction
The Biomedical-NER-ALL model is an English Named Entity Recognition (NER) model designed to recognize 107 biomedical entities from text corpora such as case reports. It is built on the distilbert-base-uncased
architecture and utilizes the Maccrobat dataset for training. This model offers a straightforward way to apply token classification in biomedical texts.
Architecture
The model utilizes the distilbert-base-uncased
transformer architecture, which is a lighter version of BERT, optimized for speed and efficiency. The model is specifically tailored for token classification tasks and can be easily integrated with Hugging Face's Transformers library for inference.
Training
- Dataset: Maccrobat (accessible here)
- Training Time: Approximately 30.17 minutes
- GPU Used: GeForce RTX 3060 Laptop GPU
- Carbon Emission: 0.0279399890043426 Kg
Guide: Running Locally
-
Setup Environment:
- Ensure you have Python and pip installed.
- Install
transformers
library:pip install transformers
-
Load Model and Tokenizer:
- Use the Hugging Face Transformers library to load the model and tokenizer:
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("d4data/biomedical-ner-all") model = AutoModelForTokenClassification.from_pretrained("d4data/biomedical-ner-all") pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # Add device=0 for GPU
- Use the Hugging Face Transformers library to load the model and tokenizer:
-
Inference:
- Run the NER pipeline on your text:
result = pipe("The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.")
- Run the NER pipeline on your text:
-
Cloud GPUs:
- Consider using cloud-based GPU services like AWS, GCP, or Azure for enhanced performance.
License
The model is released under the Apache-2.0 License, allowing for both commercial and non-commercial use with proper attribution.