Legal B E R Timbau base

rufimelo

Introduction

Legal_BERTimbau Large is a fine-tuned BERT model based on BERTimbau, designed for Brazilian Portuguese. The original BERTimbau model is state-of-the-art for tasks such as Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment. Legal_BERTimbau adapts BERTimbau for the legal domain by performing a pre-training epoch over 30,000 legal documents.

Architecture

Legal_BERTimbau is available in two architectures:

  • BERT-Base: 12 layers with 110 million parameters.
  • BERT-Large: 24 layers with 335 million parameters.

Training

The model was fine-tuned using 30,000 legal documents in Portuguese to create a language model adapted for the legal domain. This fine-tuning allows it to handle domain-specific language nuances effectively.

Guide: Running Locally

  1. Installation: Ensure Python and PyTorch are installed. Install the transformers library from Hugging Face.

    pip install transformers torch
    
  2. Usage: Load the model and tokenizer using the code below:

    from transformers import AutoTokenizer, AutoModelForMaskedLM
    
    tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")
    model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")
    
  3. Prediction: Use the model for masked language modeling:

    from transformers import pipeline
    
    pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
    pipe('O advogado apresentou [MASK] para o juíz')
    
  4. Embeddings: Generate embeddings using:

    import torch
    from transformers import AutoModel
    
    model = AutoModel.from_pretrained('rufimelo/Legal-BERTimbau-base')
    input_ids = tokenizer.encode('O advogado apresentou recurso para o juíz', return_tensors='pt')
    
    with torch.no_grad():
        outs = model(input_ids)
        encoded = outs[0][0, 1:-1]
    

For optimal performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The Legal_BERTimbau model is licensed under the MIT License, permitting use, modification, and distribution with proper attribution.

More Related APIs in Fill Mask