biobert_diseases_ner
alvaroalon2Introduction
The BioBERT_DISEASES_NER
model is a fine-tuned BioBERT model designed for Named Entity Recognition (NER) tasks specifically targeting biomedical entities related to diseases. It uses the BC5CDR-diseases and NCBI-disease corpora for training, enabling it to effectively identify disease-related entities in biomedical text. The model is part of the Hugging Face Transformers library and is implemented using PyTorch.
Architecture
The model is based on the BERT architecture, adapted for the biomedical domain. It leverages pre-trained weights from BioBERT and has been fine-tuned to classify tokens in text as disease-related entities. The architecture is suitable for processing large volumes of text data and extracting relevant biomedical entities.
Training
The BioBERT_DISEASES_NER
model was fine-tuned using datasets such as BC5CDR-diseases and NCBI-disease. The fine-tuning process involved adjusting the model parameters to optimize NER performance in identifying disease names and related biomedical terms.
Guide: Running Locally
-
Install Dependencies:
- Ensure you have Python installed.
- Install the Hugging Face Transformers library using pip:
pip install transformers
- Install PyTorch. Follow the instructions on the PyTorch website to select the appropriate version for your system.
-
Load the Model:
from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("alvaroalon2/biobert_diseases_ner") model = AutoModelForTokenClassification.from_pretrained("alvaroalon2/biobert_diseases_ner")
-
Run Inference:
- Tokenize and predict entities in your text.
- Use a GPU for faster inference if available.
-
Cloud GPUs:
- Consider using cloud services like AWS, Google Cloud, or Azure, which offer GPU instances to speed up model training and inference.
License
The model is available under the Apache 2.0 License, allowing for both personal and commercial use. Ensure you comply with the terms outlined in the license.