legal bert base cased ptbr LLM Model

Introduction

legal-bert-base-cased-ptbr is a Portuguese language model designed for the legal domain, based on the BERTimbau base model. It uses a fill-mask objective to assist in NLP research related to legal texts, computer law, and legal technology applications. The model was pre-trained with various legal documents in Portuguese.

Architecture

The model is built upon the BERT architecture, specifically the BERTimbau base model. It is designed to handle tasks such as fill-mask prediction, making it suitable for processing and understanding legal texts in Portuguese.

Training

The pre-training corpus included a variety of legal documents provided by the Brazilian Supreme Federal Tribunal. Here are the key statistics from the training process:

Number of examples: 353,435
Number of epochs: 3
Batch size per device: 4
Total training batch size: 32
Gradient accumulation steps: 1
Total optimization steps: 33,135
Training loss: 0.6108
Evaluation loss: 0.4725
Perplexity: 1.6040

Guide: Running Locally

To use the legal-bert-base-cased-ptbr model locally, follow these steps:

Install the Transformers Library: Ensure you have the transformers library installed.
```
pip install transformers
```

Load the Model and Tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("dominguesm/legal-bert-base-cased-ptbr")
model = AutoModel.from_pretrained("dominguesm/legal-bert-base-cased-ptbr")

Optional: Use with a Pipeline:

from transformers import pipeline

fill_mask = pipeline('fill-mask', model="dominguesm/legal-bert-base-cased-ptbr")

Consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure for more efficient processing, especially for large-scale tasks.

License

The model is licensed under the Creative Commons Attribution 4.0 International (cc-by-4.0).

More Related APIs in Fill Mask