bert base portuguese cased

neuralmind

Introduction

BERTimbau Base is a pretrained BERT model specifically designed for Brazilian Portuguese. It excels in several natural language processing (NLP) tasks, including Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment. BERTimbau is available in two configurations: Base and Large. More information is available in the BERTimbau repository.

Architecture

  • Model Variants:
    • BERT-Base: 12 layers, 110 million parameters
    • BERT-Large: 24 layers, 335 million parameters

Training

BERTimbau models are trained using the brWaC dataset. The architecture follows the BERT model design optimized for Portuguese language tasks. The training process involves masked language modeling to enable contextual understanding of the language.

Guide: Running Locally

  1. Installation:

    • Install the Transformers library from Hugging Face:
      pip install transformers
      
    • Install PyTorch or TensorFlow, depending on preference.
  2. Usage Example:

    • Load the model and tokenizer:
      from transformers import AutoTokenizer, AutoModelForPreTraining
      model = AutoModelForPreTraining.from_pretrained('neuralmind/bert-base-portuguese-cased')
      tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased', do_lower_case=False)
      
    • Perform masked language prediction:
      from transformers import pipeline
      pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
      results = pipe('Tinha uma [MASK] no meio do caminho.')
      
  3. Cloud GPU Recommendations:

    • For enhanced performance, consider utilizing cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

BERTimbau is distributed under the MIT License. This permissive license allows for personal, academic, and commercial use, modification, and distribution.

More Related APIs in Fill Mask