biobertpt bio
pucprIntroduction
BioBERTpt is a Portuguese neural language model specifically designed for clinical and biomedical Named Entity Recognition (NER). This model is based on BERT and trained using a multilingual-cased BERT model on Portuguese biomedical literature from Pubmed and Scielo.
Architecture
BioBERTpt utilizes a BERT-based architecture and is trained on Portuguese clinical notes and biomedical literature. It is initialized with BERT-Multilingual-Cased, enabling it to comprehend and process clinical and biomedical information in Portuguese effectively.
Training
The model is trained using Portuguese biomedical literature and scientific papers to enhance its performance in clinical NER tasks. It leverages transfer learning from a multilingual BERT model, allowing it to perform well on Portuguese NER tasks with a limited amount of labeled data.
Guide: Running Locally
To use the BioBERTpt model locally, you can load it via the transformers
library:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("pucpr/biobertpt-bio")
model = AutoModel.from_pretrained("pucpr/biobertpt-bio")
For efficient performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The study and development of BioBERTpt were partially funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) under Finance Code 001. For further details, users can refer to the model's original paper and the BioBERTpt GitHub repository.