Randeng D E L L A C V A E 226 M N E R Chinese LLM Model

Introduction

The Randeng-DELLA-CVAE-226M-NER-Chinese model is a deep Conditional Variational Autoencoder (CVAE) developed for Named Entity Recognition (NER) in Chinese. It leverages a GPT-2 architecture for both encoding and decoding, allowing it to generate sentences that include specified named entities and their types. The model is pretrained on the Wudao dataset and finetuned on NER tasks.

Architecture

The model employs a variational transformer framework with layer-wise latent variable inference for text generation. It incorporates the structure of layer-wise recurrent latent variables, differing from previous approaches by utilizing a simple linear transformation to integrate latent vectors into decoder hidden states. This method enhances stability during pretraining on open-domain datasets.

Training

The model was initially pretrained on the comprehensive Wudao dataset followed by finetuning on a dataset specifically for NER tasks. This dual-phase training enables the model to effectively generate contextually relevant sentences with specified named entities.

Guide: Running Locally

To run the model locally:

Clone the Fengshenbang-LM repository.
Install the required libraries, primarily PyTorch and the Transformers library.

Use the following script to load and execute the model:

import torch
from torch.nn.utils.rnn import pad_sequence
from fengshen.models.deepVAE.deep_vae import Della
from transformers.models.bert.tokenization_bert import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("IDEA-CCNL/Randeng-DELLA-CVAE-226M-NER-Chinese")
vae_model = Della.from_pretrained("IDEA-CCNL/Randeng-DELLA-CVAE-226M-NER-Chinese")

special_tokens_dict = {'bos_token': '<BOS>', 'eos_token': '<EOS>', 'additional_special_tokens': ['<ENT>', '<ENS>']}
tokenizer.add_special_tokens(special_tokens_dict)

device = 0  # Change to `cuda` if a GPU is available
model = vae_model.model.to(device)

# Continue with the provided script for entity processing and text generation...

For optimal performance, consider using cloud GPUs such as those available from AWS, Google Cloud, or Azure.

License

The model and its associated resources are available under a license that requires proper citation if utilized in academic or commercial work. For citation details, please refer to the respective papers and resources provided:

Fengshenbang 1.0: arXiv:2209.02970
Fengshenbang-LM GitHub: Fengshenbang-LM

More Related APIs in Text Generation