german qg t5 quad
dehioIntroduction
GERMAN-QG-T5-QUAD is a model designed for generating questions in German. It is a fine-tuned version of the valhalla/t5-base-qg-hl
model, tailored specifically for the GermanQUAD dataset. The model requires the expected answer to be highlighted using a <hl>
token.
Architecture
The model is based on the T5 architecture and is optimized for question generation tasks. It utilizes PyTorch as the underlying framework and was fine-tuned using the GermanQUAD dataset.
Training
The model was trained with the following hyperparameters:
- Learning Rate: 0.0001
- Train Batch Size: 2
- Eval Batch Size: 2
- Seed: 100
- Gradient Accumulation Steps: 8
- Total Train Batch Size: 16
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- Learning Rate Scheduler Type: Linear
- Number of Epochs: 10
The model achieved a BLEU-4 score of 11.30 on the GermanQuAD test set (n=2204). The training script is available on GitHub.
Guide: Running Locally
To run the GERMAN-QG-T5-QUAD model locally, follow these steps:
-
Install Dependencies: Ensure you have the necessary libraries installed.
pip install transformers==4.13.0.dev0 torch==1.10.0+cu102 datasets==1.16.1 tokenizers==0.10.3
-
Load the Model: Use the Transformers library to load the model.
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("dehio/german-qg-t5-quad") model = T5ForConditionalGeneration.from_pretrained("dehio/german-qg-t5-quad")
-
Run Inference: Prepare your input and generate questions.
input_text = "generate question: Obwohl die Vereinigten Staaten wie auch viele Staaten des Commonwealth Erben des britischen Common Laws sind, setzt sich das amerikanische Recht bedeutend davon ab. <hl>" input_ids = tokenizer(input_text, return_tensors="pt").input_ids output = model.generate(input_ids) print(tokenizer.decode(output[0], skip_special_tokens=True))
For optimal performance, consider using a cloud GPU service like AWS, Google Cloud, or Azure.
License
The GERMAN-QG-T5-QUAD model is licensed under the MIT License.