Lawformer
xcjthuLAWFORMER
Introduction
This repository provides the source code and checkpoints for the paper "Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents". The checkpoint can be downloaded from the Hugging Face model hub or here.
Architecture
Lawformer is based on the Longformer architecture, specifically designed to handle long documents in the legal domain. It leverages the capabilities of transformers and is implemented using PyTorch.
Training
The training details are not explicitly provided in the README. However, it is based on a pre-trained transformer model tailored to handle Chinese legal documents, likely involving fine-tuning on relevant datasets.
Guide: Running Locally
-
Install Transformers Library: Ensure you have the
transformers
library installed. You can do this via pip:pip install transformers
-
Load the Model and Tokenizer:
from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") model = AutoModel.from_pretrained("xcjthu/Lawformer")
-
Prepare Inputs and Run the Model:
inputs = tokenizer("任某提起诉讼,请求判令解除婚姻关系并对夫妻共同财产进行分割。", return_tensors="pt") outputs = model(**inputs)
-
Consider using Cloud GPUs: For improved performance and handling large datasets, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The specific license under which Lawformer is released is not mentioned in the README. Please refer to the repository or contact the authors for licensing information.