Deberta Chinese Large
WENGSYXIntroduction
The Deberta-Chinese-Large project is based on Microsoft's open-source Deberta model, specifically adapted for the Chinese language. The goal of this project is to provide additional pre-trained language model options for others.
Architecture
This model utilizes the Deberta architecture and is pre-trained using methods like Whole Word Masking (WWM) and n-gram Masked Language Modeling (n-gramMLM). It is built upon the WuDaoCorpora, a large-scale, high-quality dataset from the Beijing Academy of Artificial Intelligence, supporting the "Wu Dao" large model project research.
Training
- Learning Rate: 1e-5
- Batch Size: 512
- Hardware: 2 NVIDIA RTX 3090 GPUs
- Dataset Size: 200GB
- Training Duration: 14 days
- Optimizer: AdamW
Guide: Running Locally
To run the model locally, utilize the huggingface-transformers
library:
from transformers import BertTokenizer, AutoModel
tokenizer = BertTokenizer.from_pretrained("WENGSYX/Deberta-Chinese-Large")
model = AutoModel.from_pretrained("WENGSYX/Deberta-Chinese-Large")
Note: Use BertTokenizer
to load the Chinese vocabulary.
Suggested Cloud GPUs
For efficient training and deployment, consider using cloud GPUs such as AWS EC2 with NVIDIA GPUs, Google Cloud's TPU, or Azure's N-series instances.
License
The model is open-source, intended to provide a pre-trained language model option for Chinese. For specific licensing details, refer to the model's repository on Hugging Face.