t5 base finetuned wiki S Q L

mrm8488

Introduction

This project involves fine-tuning Google's T5 model on the WikiSQL dataset to translate English queries into SQL commands. The T5 model is renowned for its transfer learning capabilities, which have been applied to a variety of natural language processing tasks.

Architecture

The T5 model, introduced in the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer," uses a text-to-text framework to handle different NLP tasks. It is pre-trained on a broad range of tasks before being fine-tuned on specific datasets like WikiSQL for SQL translation.

Training

The model is fine-tuned using a script adapted from a Colab Notebook by Suraj Patil. The WikiSQL dataset consists of 56,355 training samples and 14,436 validation samples, which are used to train and validate the model's SQL translation capabilities.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install the transformers library:

    pip install transformers
    
  2. Load the model and tokenizer:

    from transformers import AutoModelWithLMHead, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
    model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
    
  3. Define a function to generate SQL from English queries:

    def get_sql(query):
        input_text = "translate English to SQL: %s </s>" % query
        features = tokenizer([input_text], return_tensors='pt')
        output = model.generate(input_ids=features['input_ids'], attention_mask=features['attention_mask'])
        return tokenizer.decode(output[0])
    
  4. Example usage:

    query = "How many models were finetuned using BERT as base model?"
    print(get_sql(query))
    

For optimal performance, it is recommended to use cloud GPU providers such as AWS EC2, Google Cloud, or Azure for running the model.

License

This project is licensed under the Apache-2.0 License.

More Related APIs in Text2text Generation