Randeng D E L L A C V A E 226 M N E R Chinese
IDEA-CCNLIntroduction
The Randeng-DELLA-CVAE-226M-NER-Chinese model is a deep Conditional Variational Autoencoder (CVAE) developed for Named Entity Recognition (NER) in Chinese. It leverages a GPT-2 architecture for both encoding and decoding, allowing it to generate sentences that include specified named entities and their types. The model is pretrained on the Wudao dataset and finetuned on NER tasks.
Architecture
The model employs a variational transformer framework with layer-wise latent variable inference for text generation. It incorporates the structure of layer-wise recurrent latent variables, differing from previous approaches by utilizing a simple linear transformation to integrate latent vectors into decoder hidden states. This method enhances stability during pretraining on open-domain datasets.
Training
The model was initially pretrained on the comprehensive Wudao dataset followed by finetuning on a dataset specifically for NER tasks. This dual-phase training enables the model to effectively generate contextually relevant sentences with specified named entities.
Guide: Running Locally
To run the model locally:
-
Clone the Fengshenbang-LM repository.
-
Install the required libraries, primarily PyTorch and the Transformers library.
-
Use the following script to load and execute the model:
import torch from torch.nn.utils.rnn import pad_sequence from fengshen.models.deepVAE.deep_vae import Della from transformers.models.bert.tokenization_bert import BertTokenizer tokenizer = BertTokenizer.from_pretrained("IDEA-CCNL/Randeng-DELLA-CVAE-226M-NER-Chinese") vae_model = Della.from_pretrained("IDEA-CCNL/Randeng-DELLA-CVAE-226M-NER-Chinese") special_tokens_dict = {'bos_token': '<BOS>', 'eos_token': '<EOS>', 'additional_special_tokens': ['<ENT>', '<ENS>']} tokenizer.add_special_tokens(special_tokens_dict) device = 0 # Change to `cuda` if a GPU is available model = vae_model.model.to(device) # Continue with the provided script for entity processing and text generation...
-
For optimal performance, consider using cloud GPUs such as those available from AWS, Google Cloud, or Azure.
License
The model and its associated resources are available under a license that requires proper citation if utilized in academic or commercial work. For citation details, please refer to the respective papers and resources provided:
- Fengshenbang 1.0: arXiv:2209.02970
- Fengshenbang-LM GitHub: Fengshenbang-LM