bert base portuguese cased
neuralmindIntroduction
BERTimbau Base is a pretrained BERT model specifically designed for Brazilian Portuguese. It excels in several natural language processing (NLP) tasks, including Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment. BERTimbau is available in two configurations: Base and Large. More information is available in the BERTimbau repository.
Architecture
- Model Variants:
- BERT-Base: 12 layers, 110 million parameters
- BERT-Large: 24 layers, 335 million parameters
Training
BERTimbau models are trained using the brWaC dataset. The architecture follows the BERT model design optimized for Portuguese language tasks. The training process involves masked language modeling to enable contextual understanding of the language.
Guide: Running Locally
-
Installation:
- Install the Transformers library from Hugging Face:
pip install transformers
- Install PyTorch or TensorFlow, depending on preference.
- Install the Transformers library from Hugging Face:
-
Usage Example:
- Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForPreTraining model = AutoModelForPreTraining.from_pretrained('neuralmind/bert-base-portuguese-cased') tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased', do_lower_case=False)
- Perform masked language prediction:
from transformers import pipeline pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer) results = pipe('Tinha uma [MASK] no meio do caminho.')
- Load the model and tokenizer:
-
Cloud GPU Recommendations:
- For enhanced performance, consider utilizing cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
BERTimbau is distributed under the MIT License. This permissive license allows for personal, academic, and commercial use, modification, and distribution.