roberta base chinese extractive qa LLM Model

Introduction

The ROBERTA-BASE-CHINESE-EXTRACTIVE-QA model is designed for extractive question answering tasks. It is fine-tuned using the UER-py framework and can also be fine-tuned using TencentPretrain, enabling support for models with over one billion parameters and extending to multimodal pre-training frameworks.

Architecture

This model is a Chinese variant of the RoBERTa-base architecture, specifically adapted for question-answering in the Chinese language. It leverages the robust capabilities of the RoBERTa architecture, optimized for performance in extractive question-answering tasks.

Training

The model is trained on datasets from three primary sources: CMRC2018, WebQA, and Laisi. Only the training sets from these datasets are utilized. Fine-tuning is performed over three epochs with a sequence length of 512, starting from a pre-trained chinese_roberta_L-12_H-768 model. The training process is conducted using UER-py on Tencent Cloud, and the model is saved at the end of each epoch if it demonstrates the best performance on the development set.

Guide: Running Locally

To use the model locally, follow these steps:

Install Hugging Face Transformers:
```
pip install transformers
```

Load the model and tokenizer:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
model = AutoModelForQuestionAnswering.from_pretrained('uer/roberta-base-chinese-extractive-qa')
tokenizer = AutoTokenizer.from_pretrained('uer/roberta-base-chinese-extractive-qa')

Create a question-answering pipeline:

QA = pipeline('question-answering', model=model, tokenizer=tokenizer)

Run a sample query:

QA_input = {
    'question': "著名诗歌《假如生活欺骗了你》的作者是",
    'context': "普希金从那里学习人民的语言，吸取了许多有益的养料，这一切对普希金后来的创作产生了很大的影响。..."
}
QA(QA_input)

For enhanced performance, especially with large models, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

Refer to the Hugging Face Model Hub or the UER-py repository for specific licensing information related to the use and distribution of the model and related codebases.