nb bert large
NbAiLabIntroduction
NB-BERT-large is a Norwegian language model based on BERT-large architecture. Developed by the National Library of Norway, it is trained on a diverse collection of Norwegian text in both bokmål and nynorsk languages, utilizing a monolingual Norwegian vocabulary.
Architecture
NB-BERT-large is built upon the BERT-large architecture, which is known for its transformative capabilities in natural language processing tasks. This architecture is particularly well-suited for tasks such as fill-mask, leveraging the vast corpus of Norwegian texts.
Training
The model was trained from scratch on a comprehensive dataset consisting of Norwegian text spanning the past 200 years. This extensive training data ensures that the model is well-equipped to handle various linguistic nuances in both bokmål and nynorsk. The training set and further details can be accessed through the NBAiLab's GitHub repository.
Guide: Running Locally
To run NB-BERT-large locally, follow these steps:
-
Environment Setup:
- Install Python and required libraries such as PyTorch or TensorFlow.
- Use a package manager like
pip
to install the Hugging Face Transformers library.
-
Model Download:
- Download the model from Hugging Face's model hub using the Transformers library.
-
Inference:
- Load the model in your script and run inference tasks like fill-mask.
-
Hardware Suggestions:
- It is recommended to use cloud-based GPUs for efficient processing, such as those offered by AWS, Google Cloud, or Azure.
License
NB-BERT-large is released under the Creative Commons Attribution 4.0 International License (cc-by-4.0). This allows users to share and adapt the model, provided appropriate credit is given.