nb bert base

NbAiLab

Introduction

NB-BERT-base is a general-purpose BERT-base model developed utilizing the extensive digital collection at the National Library of Norway. It is designed following the architecture of the BERT Cased multilingual model and trained on diverse Norwegian texts, including both bokmål and nynorsk, spanning 200 years.

Architecture

NB-BERT-base adopts the BERT Cased multilingual model architecture. It is specifically tailored for the Norwegian language, encompassing a broad spectrum of written content.

Training

The model is trained using a comprehensive dataset that includes a wide array of Norwegian texts. Additional details on the training data and methodology can be found in the repository: NBAiLab/notram.

Guide: Running Locally

To run NB-BERT-base locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and PyTorch installed. Use the following commands:
    pip install transformers
    pip install torch
    
  2. Load the Model: Use Hugging Face's Transformers library to load the model:
    from transformers import BertTokenizer, BertForMaskedLM
    
    tokenizer = BertTokenizer.from_pretrained("NbAiLab/nb-bert-base")
    model = BertForMaskedLM.from_pretrained("NbAiLab/nb-bert-base")
    
  3. Inference: Use the model for fill-mask tasks:
    input_text = "På biblioteket kan du [MASK] en bok."
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model(**inputs)
    
  4. Cloud GPUs: For better performance, consider running the model on cloud GPUs via platforms like AWS, Google Cloud, or Azure.

License

NB-BERT-base is licensed under the Creative Commons Attribution 4.0 International License (cc-by-4.0). This allows for sharing and adaptation with appropriate credit.

More Related APIs in Fill Mask