In Case Law B E R T

law-ai

Introduction

InCaseLawBERT is a model developed for processing Indian legal texts. It is based on the Legal-BERT model and further trained on a comprehensive corpus of legal documents from Indian courts. The model is optimized for tasks such as legal statute identification, semantic segmentation, and court judgment prediction.

Architecture

InCaseLawBERT is built on the architecture of the bert-base-uncased model, featuring 12 hidden layers, 768 hidden dimensions, 12 attention heads, and approximately 110 million parameters. It utilizes the same tokenizer as CaseLawBERT.

Training

The training data consists of approximately 5.4 million legal documents from the Indian Supreme Court and various High Courts, spanning from 1950 to 2019. The model underwent 300,000 training steps using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks. It was initialized with the Legal-BERT model and further trained on this extensive dataset.

Guide: Running Locally

  1. Install Transformers Library: Ensure you have the transformers library installed. You can do this via pip:

    pip install transformers
    
  2. Load Model and Tokenizer:

    from transformers import AutoTokenizer, AutoModel
    tokenizer = AutoTokenizer.from_pretrained("law-ai/InCaseLawBERT")
    model = AutoModel.from_pretrained("law-ai/InCaseLawBERT")
    
  3. Encode Input Text:

    text = "Replace this string with yours"
    encoded_input = tokenizer(text, return_tensors="pt")
    
  4. Get Model Output:

    output = model(**encoded_input)
    last_hidden_state = output.last_hidden_state
    

For optimal performance, consider using cloud GPUs offered by platforms like AWS, Google Cloud, or Azure.

License

InCaseLawBERT is distributed under the MIT License, allowing for flexible usage and modification.

More Related APIs in Fill Mask