bart base chinese
fnlpIntroduction
The Chinese BART-Base model is an advanced implementation aimed at Chinese language understanding and generation. It updates the previous model versions by enhancing vocabulary and position embeddings, ensuring better performance in various tasks.
Architecture
Chinese BART-Base utilizes a seq2seq architecture with improvements over traditional BART models. It features a larger vocabulary of 51,271 tokens, which includes additional Chinese and English characters and extends position embeddings from 512 to 1024. These updates enhance its capability to handle more complex sequences.
Training
The updated model was initialized from existing checkpoints, aligning the new vocabulary. Training involved 50,000 steps with a batch size of 2048 and a maximum sequence length of 1024. The peak learning rate was set at 2e-5 with a warmup ratio of 0.1. Despite slight variations in performance on specific tasks compared to previous versions, the model generally maintains strong results.
Guide: Running Locally
-
Install Dependencies: Ensure you have
transformers
library installed.pip install transformers
-
Load the Model:
from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline tokenizer = BertTokenizer.from_pretrained("fnlp/bart-base-chinese") model = BartForConditionalGeneration.from_pretrained("fnlp/bart-base-chinese") text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
-
Generate Text:
text2text_generator("北京是[MASK]的首都", max_length=50, do_sample=False)
-
Cloud GPUs: For faster processing, consider using cloud-based GPU services like AWS EC2, Google Cloud Platform, or Azure.
License
The Chinese BART-Base model is available under the Apache 2.0 License. This allows for both commercial and non-commercial use, provided that proper attribution is given to the authors.