Legal B E R Timbau base LLM Model

Introduction

Legal_BERTimbau Large is a fine-tuned BERT model based on BERTimbau, designed for Brazilian Portuguese. The original BERTimbau model is state-of-the-art for tasks such as Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment. Legal_BERTimbau adapts BERTimbau for the legal domain by performing a pre-training epoch over 30,000 legal documents.

Architecture

Legal_BERTimbau is available in two architectures:

BERT-Base: 12 layers with 110 million parameters.
BERT-Large: 24 layers with 335 million parameters.

Training

The model was fine-tuned using 30,000 legal documents in Portuguese to create a language model adapted for the legal domain. This fine-tuning allows it to handle domain-specific language nuances effectively.

Guide: Running Locally

Installation: Ensure Python and PyTorch are installed. Install the transformers library from Hugging Face.
```
pip install transformers torch
```

Usage: Load the model and tokenizer using the code below:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")

Prediction: Use the model for masked language modeling:

from transformers import pipeline

pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('O advogado apresentou [MASK] para o juíz')

Embeddings: Generate embeddings using:

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained('rufimelo/Legal-BERTimbau-base')
input_ids = tokenizer.encode('O advogado apresentou recurso para o juíz', return_tensors='pt')

with torch.no_grad():
    outs = model(input_ids)
    encoded = outs[0][0, 1:-1]

For optimal performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The Legal_BERTimbau model is licensed under the MIT License, permitting use, modification, and distribution with proper attribution.

More Related APIs in Fill Mask