legal bert base uncased

nlpaueb

Introduction

LEGAL-BERT is a specialized family of BERT models designed for applications in the legal domain, developed by the NLP group at Athens University of Economics and Business. It aims to enhance legal NLP research, computational law, and related technologies by pre-training on a vast corpus of legal texts.

Architecture

The models are based on the BERT architecture, specifically modified for legal texts, and include variants like CONTRACTS-BERT, EURLEX-BERT, and ECHR-BERT. These models are optimized for legal-specific tasks and outperform standard BERT models in this domain. A more efficient, light-weight version of LEGAL-BERT is also available.

Training

LEGAL-BERT was pre-trained on 12 GB of diverse legal texts, including EU and UK legislation, cases from European courts, US court cases, and contracts from the SEC. The training followed the setup of the BERT-BASE model, using a Google Cloud TPU for one million training steps with a learning rate of 1e-4. The LEGAL-BERT models use a newly created vocabulary tailored to legal language.

Guide: Running Locally

To run LEGAL-BERT models locally:

  1. Install Transformers Library: Ensure you have transformers installed via pip.
    pip install transformers
    
  2. Load the Model: Use the following code to load the tokenizer and model.
    from transformers import AutoTokenizer, AutoModel
    
    tokenizer = AutoTokenizer.from_pretrained("nlpaueb/legal-bert-base-uncased")
    model = AutoModel.from_pretrained("nlpaueb/legal-bert-base-uncased")
    
  3. Inference: You can now use the model for tasks like masked token prediction.

For optimal performance, use cloud GPUs from providers like AWS, Google Cloud, or Azure, especially for large-scale inference or fine-tuning.

License

LEGAL-BERT is released under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license, allowing for sharing and adaptation with proper attribution.

More Related APIs in Fill Mask