clinicalnerpt disease
pucprIntroduction
The Disease NER model is part of the BioBERTpt project, designed for clinical Named Entity Recognition (NER) in Portuguese. It is based on the BioBERTpt model and trained using the Brazilian clinical corpus SemClinBr. It identifies clinical entities compatible with UMLS standards through 13 models.
Architecture
The model leverages the BioBERTpt architecture, which is a multilingual BERT model adapted for Portuguese. It processes clinical narratives and biomedical texts, benefiting from contextual embeddings to enhance the identification of clinical entities.
Training
The model was trained for 10 epochs using the IOB2 format on the SemClinBr corpus. This training was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES). The model outperformed baseline models in F1-score by 2.72%, showing higher performance in 11 out of 13 evaluated entities.
Guide: Running Locally
- Setup Environment: Ensure you have Python and PyTorch installed.
- Clone the Repository:
git clone https://github.com/HAILab-PUCPR/SemClinBr
- Install Dependencies:
pip install -r requirements.txt
- Download the Model: Access the model through Hugging Face or the linked repositories.
- Run Inference: Use the model with your data for NER tasks.
- Cloud GPUs: Consider using cloud services like AWS or Google Cloud for enhanced performance.
License
The model and related resources are provided under a license that should be reviewed in the respective repositories or documentation for specific terms of use.