chinese macbert large
hflIntroduction
MacBERT is an enhanced BERT model specifically designed for Chinese Natural Language Processing. It introduces a novel pre-training task called MLM as correction, which aims to reduce the gap between pre-training and fine-tuning phases. Instead of using traditional [MASK] tokens, MacBERT employs similar words for masking, utilizing the Synonyms toolkit for selecting replacements based on word2vec similarity. The model also integrates techniques like Whole Word Masking (WWM), N-gram masking, and Sentence-Order Prediction (SOP).
Architecture
MacBERT maintains the same neural architecture as the original BERT, allowing it to be used as a direct substitute. The enhancements focus on the pre-training tasks rather than altering the foundational architecture.
Training
MacBERT's training involves using similar words instead of [MASK] tokens for pre-training tasks. This approach is intended to create a more seamless transition to fine-tuning stages. The model supports additional techniques like Whole Word Masking, N-gram masking, and Sentence-Order Prediction to improve its performance in understanding Chinese text.
Guide: Running Locally
-
Setup Environment:
- Ensure you have Python and PyTorch or TensorFlow installed.
- Install the Hugging Face Transformers library:
pip install transformers
-
Load Model:
- Use BERT-related functions to load MacBERT from the Hugging Face model hub:
from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer.from_pretrained("hfl/chinese-macbert-large") model = BertModel.from_pretrained("hfl/chinese-macbert-large")
- Use BERT-related functions to load MacBERT from the Hugging Face model hub:
-
Inference:
- Tokenize input text and perform inference using the loaded model.
-
Cloud GPUs:
- For large-scale tasks or training, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure for better performance.
License
MacBERT is released under the Apache-2.0 License, allowing for wide usage and modification with proper attribution.