bertimbau base finetuned brazilian_court_decisions
LucianoIntroduction
BERTIMBAU-BASE-FINETUNED-BRAZILIAN_COURT_DECISIONS is a fine-tuned version of the neuralmind/bert-base-portuguese-cased
model tailored for text classification tasks on Brazilian court decisions. The model is trained to achieve high accuracy in classifying legal texts in Portuguese.
Architecture
The model is built upon BERT architecture, specifically the neuralmind/bert-base-portuguese-cased
version, designed for processing Portuguese language datasets. It operates within the Hugging Face Transformers library framework, utilizing PyTorch as its backend.
Training
The model was trained using a dataset of Brazilian court decisions, with a focus on multi-class classification tasks. Key hyperparameters included a learning rate of 2e-05, batch sizes of 16 for both training and evaluation, and a total of 5 epochs. The Adam optimizer was employed with specific beta values, and a linear learning rate scheduler was used. The training achieved a maximum accuracy of 0.7921 and a final validation loss of 0.6424.
Guide: Running Locally
-
Install Dependencies: Ensure that you have Python installed, then install the required packages with:
pip install transformers torch datasets
-
Load the Model: Use the Hugging Face Transformers library to load the model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Luciano/bertimbau-base-finetuned-brazilian_court_decisions") model = AutoModelForSequenceClassification.from_pretrained("Luciano/bertimbau-base-finetuned-brazilian_court_decisions")
-
Prepare Input Data: Tokenize your input text using the tokenizer:
inputs = tokenizer("Entrada de texto", return_tensors="pt")
-
Inference: Run the model to get predictions:
with torch.no_grad(): outputs = model(**inputs)
-
Cloud GPUs: For better performance, consider using cloud GPU services such as AWS, Google Cloud Platform, or Azure to run the model.
License
This model is licensed under the MIT License, allowing for wide usage and modification with proper attribution.