Deberta Chinese Large LLM Model

Introduction

The Deberta-Chinese-Large project is based on Microsoft's open-source Deberta model, specifically adapted for the Chinese language. The goal of this project is to provide additional pre-trained language model options for others.

Architecture

This model utilizes the Deberta architecture and is pre-trained using methods like Whole Word Masking (WWM) and n-gram Masked Language Modeling (n-gramMLM). It is built upon the WuDaoCorpora, a large-scale, high-quality dataset from the Beijing Academy of Artificial Intelligence, supporting the "Wu Dao" large model project research.

Training

Learning Rate: 1e-5
Batch Size: 512
Hardware: 2 NVIDIA RTX 3090 GPUs
Dataset Size: 200GB
Training Duration: 14 days
Optimizer: AdamW

Guide: Running Locally

To run the model locally, utilize the huggingface-transformers library:

from transformers import BertTokenizer, AutoModel

tokenizer = BertTokenizer.from_pretrained("WENGSYX/Deberta-Chinese-Large")
model = AutoModel.from_pretrained("WENGSYX/Deberta-Chinese-Large")

Note: Use BertTokenizer to load the Chinese vocabulary.

Suggested Cloud GPUs

For efficient training and deployment, consider using cloud GPUs such as AWS EC2 with NVIDIA GPUs, Google Cloud's TPU, or Azure's N-series instances.

License

The model is open-source, intended to provide a pre-trained language model option for Chinese. For specific licensing details, refer to the model's repository on Hugging Face.

More Related APIs