german qg t5 drink600
dehioGERMAN-QG-T5-DRINK600 Model
Introduction
The GERMAN-QG-T5-DRINK600 model is fine-tuned for generating questions in German. It requires the expected answer to be highlighted with <hl>
tokens. This model builds on the german-qg-t5-quad and is further pre-trained on questions related to drink receipts from Mixology.
Architecture
The model is based on the german-qg-t5-quad, initially pre-trained on the GermanQUAD dataset. The additional pre-training involves drink-related questions, although the dataset is not open-source due to copyright restrictions.
Training
Training details are as follows:
- Training Script: Available here.
- Evaluation: Achieves a BLEU-4 score of 29.80 on the drink600 test set and 11.30 on the GermanQUAD test set.
- Comparison: The base model, german-qg-t5-quad, scores 10.76 on the drink600 test set, indicating effective fine-tuning.
- Hyperparameters:
learning_rate
: 0.0001train_batch_size
: 2eval_batch_size
: 2seed
: 100gradient_accumulation_steps
: 8total_train_batch_size
: 16optimizer
: Adam with betas=(0.9,0.999), epsilon=1e-08lr_scheduler_type
: linearnum_epochs
: 10
- Framework Versions:
- Transformers 4.13.0.dev0
- PyTorch 1.10.0+cu102
- Datasets 1.16.1
- Tokenizers 0.10.3
Guide: Running Locally
To run the model locally, follow these steps:
- Clone the Repository: Obtain the codebase from the repository.
- Install Dependencies: Ensure the correct versions of PyTorch, Transformers, Datasets, and Tokenizers are installed.
- Download the Model: Acquire the pre-trained model weights from Hugging Face.
- Inference: Utilize the model with input texts, ensuring the expected answer is highlighted with
<hl>
tokens.
For enhanced performance, consider using cloud GPUs such as those provided by AWS, Azure, or Google Cloud.
License
The model is licensed under the MIT License, allowing for free use, modification, and distribution.