t5 base finetuned common_gen LLM Model

Introduction

The T5-BASE-FINETUNED-COMMON_GEN model is a fine-tuned version of Google's T5 model, tailored for generative commonsense reasoning using the CommonGen dataset. The model is designed to generate coherent sentences by utilizing a set of common concepts, thereby testing its ability for generative commonsense reasoning.

Architecture

This model is based on the T5 architecture, which is a unified text-to-text transformer model. T5 was introduced to explore the limits of transfer learning by converting every language problem into a text-to-text format. This approach allows it to perform well across a variety of NLP tasks, such as summarization, question answering, and text classification.

Training

The model was fine-tuned on the CommonGen dataset, which consists of 30k concept-sets and 50k sentences. The dataset challenges the model to perform relational reasoning and compositional generalization. The training script used is a modified version of one by Suraj Patil, and the model achieves improved metrics compared to previous implementations, with a ROUGE-2 score of 17.10 and ROUGE-L score of 39.47.

Guide: Running Locally

Install Dependencies: Ensure you have Python installed, then install the transformers library using pip:
```
pip install transformers
```

Load the Model: Use the following Python code to load and use the model:

from transformers import AutoModelWithLMHead, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-common_gen")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-common_gen")

def gen_sentence(words, max_length=32):
    input_text = words
    features = tokenizer([input_text], return_tensors='pt')

    output = model.generate(input_ids=features['input_ids'], 
                             attention_mask=features['attention_mask'],
                             max_length=max_length)

    return tokenizer.decode(output[0], skip_special_tokens=True)

words = "tree plant ground hole dig"
print(gen_sentence(words))

Cloud GPUs: For more efficient execution, consider using cloud services like AWS, Google Cloud, or Azure to access GPU resources.

License

The model and associated files are shared under the Apache 2.0 license, which allows for both personal and commercial use, distribution, and modification, provided that proper attribution is given to the original authors.

More Related APIs in Text2text Generation