t5 base finetuned common_gen
mrm8488Introduction
The T5-BASE-FINETUNED-COMMON_GEN model is a fine-tuned version of Google's T5 model, tailored for generative commonsense reasoning using the CommonGen dataset. The model is designed to generate coherent sentences by utilizing a set of common concepts, thereby testing its ability for generative commonsense reasoning.
Architecture
This model is based on the T5 architecture, which is a unified text-to-text transformer model. T5 was introduced to explore the limits of transfer learning by converting every language problem into a text-to-text format. This approach allows it to perform well across a variety of NLP tasks, such as summarization, question answering, and text classification.
Training
The model was fine-tuned on the CommonGen dataset, which consists of 30k concept-sets and 50k sentences. The dataset challenges the model to perform relational reasoning and compositional generalization. The training script used is a modified version of one by Suraj Patil, and the model achieves improved metrics compared to previous implementations, with a ROUGE-2 score of 17.10 and ROUGE-L score of 39.47.
Guide: Running Locally
- Install Dependencies: Ensure you have Python installed, then install the
transformers
library using pip:pip install transformers
- Load the Model: Use the following Python code to load and use the model:
from transformers import AutoModelWithLMHead, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-common_gen") model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-common_gen") def gen_sentence(words, max_length=32): input_text = words features = tokenizer([input_text], return_tensors='pt') output = model.generate(input_ids=features['input_ids'], attention_mask=features['attention_mask'], max_length=max_length) return tokenizer.decode(output[0], skip_special_tokens=True) words = "tree plant ground hole dig" print(gen_sentence(words))
- Cloud GPUs: For more efficient execution, consider using cloud services like AWS, Google Cloud, or Azure to access GPU resources.
License
The model and associated files are shared under the Apache 2.0 license, which allows for both personal and commercial use, distribution, and modification, provided that proper attribution is given to the original authors.