Lawformer
thunlpLAWFORMER
Introduction
This repository provides the source code and checkpoints for the paper "Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents". The checkpoints are available for download from the Hugging Face model hub or directly from here.
Architecture
Lawformer is based on the Longformer architecture, which is specifically designed to handle long documents efficiently. It leverages the extended capabilities of the Longformer to process lengthy legal texts in Chinese, making it suitable for various legal document analysis tasks.
Training
The model was pre-trained on a large corpus of Chinese legal documents. This training process involved adapting the Longformer architecture to understand and process the specific language and structure of legal texts, which often include complex syntax and specialized vocabulary.
Guide: Running Locally
To run Lawformer locally, ensure you have the transformers
library installed. Here's a quick start guide:
-
Install the Transformers Library:
pip install transformers
-
Load the Model and Tokenizer:
from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("thunlp/Lawformer") model = AutoModel.from_pretrained("thunlp/Lawformer")
-
Process Input Text:
inputs = tokenizer("任某提起诉讼,请求判令解除婚姻关系并对夫妻共同财产进行分割。", return_tensors="pt") outputs = model(**inputs)
For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure, which provide the necessary computational resources for running large models like Lawformer.
License
The use of Lawformer is subject to the terms and conditions set forth by the original authors. Please refer to the repository or the accompanying documentation for specific licensing details. If you use the pre-trained models, please cite the following paper:
@article{xiao2021lawformer,
title={Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents},
author={Xiao, Chaojun and Hu, Xueyu and Liu, Zhiyuan and Tu, Cunchao and Sun, Maosong},
year={2021}
}