bert base uncased contracts
nlpauebIntroduction
LEGAL-BERT is a specialized family of BERT models designed for the legal domain. It aims to enhance legal NLP research and support computational law and legal technology applications. The model is pre-trained on a diverse set of English legal texts, including legislation, court cases, and contracts, leading to improved performance in domain-specific tasks compared to the original BERT model.
Architecture
LEGAL-BERT is based on the BERT architecture, particularly using configurations similar to BERT-BASE (12-layer, 768-hidden, 12-heads, 110M parameters). Variants include sub-domain specific models such as CONTRACTS-BERT, EURLEX-BERT, ECHR-BERT, and a general LEGAL-BERT model trained on a comprehensive legal corpus.
Training
LEGAL-BERT models were trained on a corpus of 12 GB of English legal texts. The training utilized Google BERT's official code and adhered to the same setup: 1 million training steps with batches of 256 sequences of length 512, and an initial learning rate of 1e-4. Training was conducted on a Google Cloud TPU v3-8, supported by TensorFlow Research Cloud and GCP research credits.
Guide: Running Locally
-
Install Transformers: Ensure you have the
transformers
library installed.pip install transformers
-
Load the Model: Use the following Python code to load the pre-trained model and tokenizer.
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("nlpaueb/bert-base-uncased-contracts") model = AutoModel.from_pretrained("nlpaueb/bert-base-uncased-contracts")
-
Suggested Cloud GPUs: For training or running large-scale inferences, consider using cloud platforms like Google Cloud Platform, AWS, or Azure, which provide GPU instances for deep learning tasks.
License
LEGAL-BERT is released under the Creative Commons Attribution-ShareAlike 4.0 International License (cc-by-sa-4.0).