roberta base chinese extractive qa
uerIntroduction
The ROBERTA-BASE-CHINESE-EXTRACTIVE-QA
model is designed for extractive question answering tasks. It is fine-tuned using the UER-py framework and can also be fine-tuned using TencentPretrain, enabling support for models with over one billion parameters and extending to multimodal pre-training frameworks.
Architecture
This model is a Chinese variant of the RoBERTa-base architecture, specifically adapted for question-answering in the Chinese language. It leverages the robust capabilities of the RoBERTa architecture, optimized for performance in extractive question-answering tasks.
Training
The model is trained on datasets from three primary sources: CMRC2018, WebQA, and Laisi. Only the training sets from these datasets are utilized. Fine-tuning is performed over three epochs with a sequence length of 512, starting from a pre-trained chinese_roberta_L-12_H-768
model. The training process is conducted using UER-py on Tencent Cloud, and the model is saved at the end of each epoch if it demonstrates the best performance on the development set.
Guide: Running Locally
To use the model locally, follow these steps:
-
Install Hugging Face Transformers:
pip install transformers
-
Load the model and tokenizer:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline model = AutoModelForQuestionAnswering.from_pretrained('uer/roberta-base-chinese-extractive-qa') tokenizer = AutoTokenizer.from_pretrained('uer/roberta-base-chinese-extractive-qa')
-
Create a question-answering pipeline:
QA = pipeline('question-answering', model=model, tokenizer=tokenizer)
-
Run a sample query:
QA_input = { 'question': "著名诗歌《假如生活欺骗了你》的作者是", 'context': "普希金从那里学习人民的语言,吸取了许多有益的养料,这一切对普希金后来的创作产生了很大的影响。..." } QA(QA_input)
For enhanced performance, especially with large models, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
Refer to the Hugging Face Model Hub or the UER-py repository for specific licensing information related to the use and distribution of the model and related codebases.