bert base uncased contracts

nlpaueb

Introduction

LEGAL-BERT is a specialized family of BERT models designed for the legal domain. It aims to enhance legal NLP research and support computational law and legal technology applications. The model is pre-trained on a diverse set of English legal texts, including legislation, court cases, and contracts, leading to improved performance in domain-specific tasks compared to the original BERT model.

Architecture

LEGAL-BERT is based on the BERT architecture, particularly using configurations similar to BERT-BASE (12-layer, 768-hidden, 12-heads, 110M parameters). Variants include sub-domain specific models such as CONTRACTS-BERT, EURLEX-BERT, ECHR-BERT, and a general LEGAL-BERT model trained on a comprehensive legal corpus.

Training

LEGAL-BERT models were trained on a corpus of 12 GB of English legal texts. The training utilized Google BERT's official code and adhered to the same setup: 1 million training steps with batches of 256 sequences of length 512, and an initial learning rate of 1e-4. Training was conducted on a Google Cloud TPU v3-8, supported by TensorFlow Research Cloud and GCP research credits.

Guide: Running Locally

  1. Install Transformers: Ensure you have the transformers library installed.

    pip install transformers
    
  2. Load the Model: Use the following Python code to load the pre-trained model and tokenizer.

    from transformers import AutoTokenizer, AutoModel
    
    tokenizer = AutoTokenizer.from_pretrained("nlpaueb/bert-base-uncased-contracts")
    model = AutoModel.from_pretrained("nlpaueb/bert-base-uncased-contracts")
    
  3. Suggested Cloud GPUs: For training or running large-scale inferences, consider using cloud platforms like Google Cloud Platform, AWS, or Azure, which provide GPU instances for deep learning tasks.

License

LEGAL-BERT is released under the Creative Commons Attribution-ShareAlike 4.0 International License (cc-by-sa-4.0).

More Related APIs in Fill Mask