Introduction

The Longformer_ZH model is a Chinese pre-trained Longformer model designed to handle lengthy document sequences with improved efficiency. Unlike the standard Transformer model, which has O(n^2) complexity, Longformer reduces this to linear complexity, making it suitable for processing sequences up to 4K characters. The model incorporates a combination of local windowed attention and task-specific global attention mechanisms.

Architecture

The Longformer_ZH model is based on the Roberta_zh architecture, which is a subclass of Transformers.BertModel. It uses a unique attention mechanism that enables efficient handling of long sequences by replacing the standard self-attention with a combination of local and global attention.

Training

The model was pre-trained using a mixed corpus from brightmart/nlp_chinese_corpus and is built on Roberta_zh_mid. The training scripts were adapted from the original Longformer scripts. Whole-Word-Masking (WWM) was introduced to better accommodate the Chinese language. The pre-training process involved 4 Titan RTX GPUs over 3,000 steps, taking approximately 4 days. Mixed precision training was utilized via Nvidia.Apex to enhance efficiency. Data preprocessing involved Jieba Chinese tokenizer and JIONLP cleaning tools.

Guide: Running Locally

  1. Download the Model:

  2. Load the Model:

    from Longformer_zh import LongformerZhForMaksedLM
    model = LongformerZhForMaksedLM.from_pretrained('ValkyriaLenneth/longformer_zh')
    
  3. Cloud GPU Suggestion:

    • For efficient local execution, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs or Google Cloud Platform.

License

The model and accompanying resources have been open-sourced to facilitate research in Chinese language processing. Specific license details should be checked from the original repository or documentation associated with the model.

More Related APIs in Feature Extraction