Vet B E R T
havocy28Introduction
VetBERT is a pretrained model designed for natural language processing (NLP) tasks related to veterinary clinical notes. It was introduced in the paper "Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes" by Hur et al., presented at BioNLP 2020. The model is built on the Bio_ClinicalBERT architecture and further pretrained with a large corpus from VetCompass Australia for veterinary medicine applications.
Architecture
VetBERT is initialized from the Bio_ClinicalBERT model, which itself is based on the BERT architecture. It is specifically adapted for veterinary clinical text processing, leveraging over 15 million veterinary clinical records and 1.3 billion tokens for pretraining.
Training
Pretraining Data
VetBERT was initialized from the Bio_ClinicalBERT model and trained using a large corpus of veterinary clinical records.
Pretraining Hyperparameters
- Batch size: 32
- Maximum sequence length: 512
- Learning rate: 5·10^−5
- Duplication factor: 5
- Masked language model probability: 0.15
- Maximum predictions per sequence: 20
Finetuning
VetBERT was finetuned on 5002 annotated clinical notes for disease syndrome classification, as detailed in the referenced paper.
Guide: Running Locally
To use VetBERT, you can load the model using the transformers
library:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERT")
model = AutoModelForMaskedLM.from_pretrained("havocy28/VetBERT")
VetBERT_masked = pipeline("fill-mask", model=model, tokenizer=tokenizer)
VetBERT('Suspected pneumonia, will require an [MASK] but in the meantime will prescribe antibiotics')
For efficient model inference, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The VetBERT model is licensed under OpenRAIL.