Deberta Chinese Large

WENGSYX

Introduction

The Deberta-Chinese-Large project is based on Microsoft's open-source Deberta model, specifically adapted for the Chinese language. The goal of this project is to provide additional pre-trained language model options for others.

Architecture

This model utilizes the Deberta architecture and is pre-trained using methods like Whole Word Masking (WWM) and n-gram Masked Language Modeling (n-gramMLM). It is built upon the WuDaoCorpora, a large-scale, high-quality dataset from the Beijing Academy of Artificial Intelligence, supporting the "Wu Dao" large model project research.

Training

  • Learning Rate: 1e-5
  • Batch Size: 512
  • Hardware: 2 NVIDIA RTX 3090 GPUs
  • Dataset Size: 200GB
  • Training Duration: 14 days
  • Optimizer: AdamW

Guide: Running Locally

To run the model locally, utilize the huggingface-transformers library:

from transformers import BertTokenizer, AutoModel

tokenizer = BertTokenizer.from_pretrained("WENGSYX/Deberta-Chinese-Large")
model = AutoModel.from_pretrained("WENGSYX/Deberta-Chinese-Large")

Note: Use BertTokenizer to load the Chinese vocabulary.

Suggested Cloud GPUs

For efficient training and deployment, consider using cloud GPUs such as AWS EC2 with NVIDIA GPUs, Google Cloud's TPU, or Azure's N-series instances.

License

The model is open-source, intended to provide a pre-trained language model option for Chinese. For specific licensing details, refer to the model's repository on Hugging Face.

More Related APIs