longformer_zh
ValkyriaLennethIntroduction
The Longformer_ZH model is a Chinese pre-trained Longformer model designed to handle lengthy document sequences with improved efficiency. Unlike the standard Transformer model, which has O(n^2) complexity, Longformer reduces this to linear complexity, making it suitable for processing sequences up to 4K characters. The model incorporates a combination of local windowed attention and task-specific global attention mechanisms.
Architecture
The Longformer_ZH model is based on the Roberta_zh architecture, which is a subclass of Transformers.BertModel. It uses a unique attention mechanism that enables efficient handling of long sequences by replacing the standard self-attention with a combination of local and global attention.
Training
The model was pre-trained using a mixed corpus from brightmart/nlp_chinese_corpus and is built on Roberta_zh_mid. The training scripts were adapted from the original Longformer scripts. Whole-Word-Masking (WWM) was introduced to better accommodate the Chinese language. The pre-training process involved 4 Titan RTX GPUs over 3,000 steps, taking approximately 4 days. Mixed precision training was utilized via Nvidia.Apex to enhance efficiency. Data preprocessing involved Jieba Chinese tokenizer and JIONLP cleaning tools.
Guide: Running Locally
-
Download the Model:
- Google Drive: Download Link
- Baidu Yun: Download Link, Extraction Code: y601
-
Load the Model:
from Longformer_zh import LongformerZhForMaksedLM model = LongformerZhForMaksedLM.from_pretrained('ValkyriaLenneth/longformer_zh')
-
Cloud GPU Suggestion:
- For efficient local execution, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs or Google Cloud Platform.
License
The model and accompanying resources have been open-sourced to facilitate research in Chinese language processing. Specific license details should be checked from the original repository or documentation associated with the model.