Gys B E R T
emanjavacasIntroduction
GysBERT is a historical language model specifically designed for the Dutch language. It originates from the MacBERTh project and aims to assist in Word Sense Disambiguation tasks for historical languages.
Architecture
The model uses the BERT base uncased architecture from the original BERT pre-training codebase. This design is aimed at processing historical Dutch texts effectively.
Training
GysBERT's training data primarily consists of resources from the DBNL and Delpher newspaper dump. Further details about the training process and its effectiveness in Word Sense Disambiguation can be found in the paper: Non-Parametric Word Sense Disambiguation for Historical Languages.
Guide: Running Locally
To run GysBERT locally, follow these basic steps:
- Clone the repository from the Hugging Face model hub.
- Install the necessary Python dependencies, including
transformers
andtorch
. - Load the model using the
transformers
library. - Prepare your dataset for inference or fine-tuning.
For enhanced performance, consider using cloud GPUs such as AWS EC2, Google Cloud Platform, or Azure.
License
GysBERT is distributed under the MIT License, allowing for flexible use and modification.