biobert_chemical_ner
alvaroalon2Introduction
The BioBERT_CHEMICAL_NER model is a fine-tuned BioBERT model designed for named entity recognition (NER) tasks in the biomedical domain, specifically for identifying chemical entities. It leverages datasets such as BC5CDR-chemicals and BC4CHEMD to enhance its performance in recognizing chemical names within biomedical texts.
Architecture
The model is based on the BERT architecture and fine-tuned with a focus on token classification, particularly in the context of biomedical NER tasks. It supports both PyTorch and TensorFlow frameworks, offering flexibility in deployment and usage.
Training
BioBERT_CHEMICAL_NER was trained using the BC5CDR-chemicals and BC4CHEMD datasets. These datasets are widely recognized in the biomedical community for their comprehensive coverage of chemical entities. The model fine-tuning process aims to optimize entity recognition accuracy within biomedical literature.
Guide: Running Locally
-
Environment Setup: Ensure you have Python installed and set up a virtual environment. Install the necessary libraries, including Hugging Face Transformers and PyTorch or TensorFlow, depending on your preference.
-
Download the Model: Use the Hugging Face
transformers
library to download the model:from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("alvaroalon2/biobert_chemical_ner") model = AutoModelForTokenClassification.from_pretrained("alvaroalon2/biobert_chemical_ner")
-
Run Inference: Tokenize your text and use the model to predict chemical entities. Post-process the output to extract named entities.
-
Cloud GPUs: For more extensive tasks or faster inference, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure ML.
License
The BioBERT_CHEMICAL_NER model is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.