GERMAN-QG-T5-DRINK600 Model

Introduction

The GERMAN-QG-T5-DRINK600 model is fine-tuned for generating questions in German. It requires the expected answer to be highlighted with <hl> tokens. This model builds on the german-qg-t5-quad and is further pre-trained on questions related to drink receipts from Mixology.

Architecture

The model is based on the german-qg-t5-quad, initially pre-trained on the GermanQUAD dataset. The additional pre-training involves drink-related questions, although the dataset is not open-source due to copyright restrictions.

Training

Training details are as follows:

Training Script: Available here.
Evaluation: Achieves a BLEU-4 score of 29.80 on the drink600 test set and 11.30 on the GermanQUAD test set.
Comparison: The base model, german-qg-t5-quad, scores 10.76 on the drink600 test set, indicating effective fine-tuning.
Hyperparameters:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 100
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999), epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
Framework Versions:
- Transformers 4.13.0.dev0
- PyTorch 1.10.0+cu102
- Datasets 1.16.1
- Tokenizers 0.10.3

Guide: Running Locally

To run the model locally, follow these steps:

Clone the Repository: Obtain the codebase from the repository.
Install Dependencies: Ensure the correct versions of PyTorch, Transformers, Datasets, and Tokenizers are installed.
Download the Model: Acquire the pre-trained model weights from Hugging Face.
Inference: Utilize the model with input texts, ensuring the expected answer is highlighted with <hl> tokens.

For enhanced performance, consider using cloud GPUs such as those provided by AWS, Azure, or Google Cloud.

License

The model is licensed under the MIT License, allowing for free use, modification, and distribution.