bert base dutch cased
GroNLPIntroduction
BERTje is a Dutch BERT model developed at the University of Groningen, designed for natural language processing tasks in Dutch. The model is detailed in the arXiv paper arXiv:1912.09582 and is available for use on Hugging Face's platform.
Architecture
BERTje is a transformer-based model utilizing the BERT architecture with 12 layers and cased tokenization. The model is specifically trained for the Dutch language and offers improvements in tasks such as named entity recognition and part-of-speech tagging compared to multilingual models.
Training
The BERTje model was pre-trained using a large Dutch corpus and fine-tuned for specific tasks. Users can find fine-tuned versions of the model on Hugging Face, which are optimized for tasks like named entity recognition and part-of-speech tagging with consistent fine-tuning procedures across different pre-trained models.
Guide: Running Locally
-
Install Transformers Library: Ensure you have the
transformers
library installed viapip install transformers
. -
Load the Model and Tokenizer:
from transformers import AutoTokenizer, AutoModel, TFAutoModel tokenizer = AutoTokenizer.from_pretrained("GroNLP/bert-base-dutch-cased") model = AutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # PyTorch # Or for TensorFlow model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased")
-
Use the Model: Tokenize your input text and pass it to the model for processing.
-
Note on Vocabulary: If using an older fine-tuned model and facing tokenizer issues, load the tokenizer with
revision="v1"
for the old vocabulary. -
Cloud GPUs: For efficient training and inference, consider using cloud-based GPU services like AWS, GCP, or Azure.
License
The model and its associated resources are subject to the licensing terms specified by the creators and distributed via platforms such as Hugging Face. Users should review the license on the model's page for compliance.