Introduction

GysBERT is a historical language model specifically designed for the Dutch language. It originates from the MacBERTh project and aims to assist in Word Sense Disambiguation tasks for historical languages.

Architecture

The model uses the BERT base uncased architecture from the original BERT pre-training codebase. This design is aimed at processing historical Dutch texts effectively.

Training

GysBERT's training data primarily consists of resources from the DBNL and Delpher newspaper dump. Further details about the training process and its effectiveness in Word Sense Disambiguation can be found in the paper: Non-Parametric Word Sense Disambiguation for Historical Languages.

Guide: Running Locally

To run GysBERT locally, follow these basic steps:

  1. Clone the repository from the Hugging Face model hub.
  2. Install the necessary Python dependencies, including transformers and torch.
  3. Load the model using the transformers library.
  4. Prepare your dataset for inference or fine-tuning.

For enhanced performance, consider using cloud GPUs such as AWS EC2, Google Cloud Platform, or Azure.

License

GysBERT is distributed under the MIT License, allowing for flexible use and modification.

More Related APIs