In Case Law B E R T
law-aiIntroduction
InCaseLawBERT is a model developed for processing Indian legal texts. It is based on the Legal-BERT model and further trained on a comprehensive corpus of legal documents from Indian courts. The model is optimized for tasks such as legal statute identification, semantic segmentation, and court judgment prediction.
Architecture
InCaseLawBERT is built on the architecture of the bert-base-uncased
model, featuring 12 hidden layers, 768 hidden dimensions, 12 attention heads, and approximately 110 million parameters. It utilizes the same tokenizer as CaseLawBERT.
Training
The training data consists of approximately 5.4 million legal documents from the Indian Supreme Court and various High Courts, spanning from 1950 to 2019. The model underwent 300,000 training steps using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks. It was initialized with the Legal-BERT model and further trained on this extensive dataset.
Guide: Running Locally
-
Install Transformers Library: Ensure you have the
transformers
library installed. You can do this via pip:pip install transformers
-
Load Model and Tokenizer:
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("law-ai/InCaseLawBERT") model = AutoModel.from_pretrained("law-ai/InCaseLawBERT")
-
Encode Input Text:
text = "Replace this string with yours" encoded_input = tokenizer(text, return_tensors="pt")
-
Get Model Output:
output = model(**encoded_input) last_hidden_state = output.last_hidden_state
For optimal performance, consider using cloud GPUs offered by platforms like AWS, Google Cloud, or Azure.
License
InCaseLawBERT is distributed under the MIT License, allowing for flexible usage and modification.