bart large chinese

fnlp

Introduction

The Chinese BART-Large model is a Transformer-based text generation model tailored for the Chinese language. It is designed for text-to-text generation tasks, supporting both language understanding and generation. The model is derived from the BART architecture and optimized for Chinese language applications.

Architecture

The Chinese BART-Large model is based on the seq2seq (sequence-to-sequence) architecture of BART, adapted for Chinese text. The updated version features an expanded vocabulary and increased position embeddings. Specifically, it uses a vocabulary size of 51,271, accommodating over 6,800 additional Chinese characters, and supports a maximum sequence length of 1,024 tokens.

Training

The model underwent further training for 50,000 steps using a batch size of 2,048, a maximum sequence length of 1,024, a peak learning rate of 2e-5, and a warmup ratio of 0.1. The pre-trained model was initialized from previous checkpoints, with new parameters randomly initialized and existing token embeddings aligned to the new vocabulary. The updated model maintains performance comparable to earlier versions, with some variations due to training adjustments and hyperparameters.

Guide: Running Locally

To run the Chinese BART-Large model locally:

  1. Install Dependencies: Ensure you have the transformers library installed.

    pip install transformers
    
  2. Load Model and Tokenizer:

    from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline
    
    tokenizer = BertTokenizer.from_pretrained("fnlp/bart-large-chinese")
    model = BartForConditionalGeneration.from_pretrained("fnlp/bart-large-chinese")
    text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
    
  3. Generate Text:

    text2text_generator("北京是[MASK]的首都", max_length=50, do_sample=False)
    
  4. Cloud GPUs: For enhanced performance, especially with large datasets or batch sizes, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

This model and its code are available under the repository's specified license. Users should refer to the GitHub repository for specific licensing details and the updated model files.

More Related APIs in Text2text Generation