bert large portuguese cased LLM Model

Introduction

BERTimbau Large is a pretrained BERT model specifically designed for Brazilian Portuguese, achieving state-of-the-art results in Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment. The model is available in two sizes: Base and Large.

Architecture

The BERTimbau model comes in two variations:

BERT-Base: 12 layers, 110 million parameters.
BERT-Large: 24 layers, 335 million parameters.

Training

BERTimbau Large was trained on the brWaC dataset, a comprehensive collection of Portuguese language data. The model supports various NLP tasks, such as masked language modeling and producing contextual embeddings.

Guide: Running Locally

Install Transformers: Ensure you have the transformers library installed:
```
pip install transformers
```

Load Model and Tokenizer:

from transformers import AutoTokenizer, AutoModelForPreTraining

tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-large-portuguese-cased', do_lower_case=False)
model = AutoModelForPreTraining.from_pretrained('neuralmind/bert-large-portuguese-cased')

Masked Language Modeling Example:

from transformers import pipeline

pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
result = pipe('Tinha uma [MASK] no meio do caminho.')

Use BERT for Embeddings:

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained('neuralmind/bert-large-portuguese-cased')
input_ids = tokenizer.encode('Tinha uma pedra no meio do caminho.', return_tensors='pt')

with torch.no_grad():
    outs = model(input_ids)
    encoded = outs[0][0, 1:-1]

To leverage the full potential of BERTimbau Large, it is recommended to use cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

BERTimbau Large is released under the MIT license, allowing for wide usage and modification.

More Related APIs in Fill Mask