Med Ro B E R Ta.nl
CLTLIntroduction
MedRoBERTa.nl is a RoBERTa-based language model specifically pre-trained on Dutch hospital notes extracted from Electronic Health Records (EHR). It is designed for use in medical NLP tasks in the Dutch language. The model is open to fine-tuning for various tasks but comes pre-trained and not fine-tuned.
Architecture
MedRoBERTa.nl is based on the RoBERTa architecture, a transformer model known for its robust performance in natural language processing tasks. It leverages the transformer architecture to understand and generate Dutch medical language effectively.
Training
The model was trained using nearly 10 million anonymized hospital notes from the Amsterdam University Medical Centres. The training process involved thorough anonymization to ensure no personal data or identifiable information was included in the model's learning. This anonymization extends to the model's vocabulary, preventing it from predicting personal names in its generative tasks.
Guide: Running Locally
To run MedRoBERTa.nl locally, follow these steps:
- Set up Environment: Ensure you have Python and PyTorch installed. Create a virtual environment for the project.
- Install Dependencies: Use
pip
to install necessary libraries, including Hugging Face'stransformers
library. - Download the Model: Access the model from Hugging Face's model hub and load it using the
transformers
library. - Fine-tune or Use Directly: The model can be used directly for tasks or fine-tuned for specific applications.
- Hardware Recommendation: Use cloud GPU services like AWS, Google Cloud, or Azure for efficient model training and inference.
License
MedRoBERTa.nl is released under the MIT License, allowing for broad use and modification with attribution.