t5 base finetuned summarize news
mrm8488Introduction
The T5-Base-Finetuned-Summarize-News model is a fine-tuned version of Google's T5 model, specifically adapted for summarizing news articles. The model is based on the T5 architecture, which utilizes a text-to-text framework for various natural language processing tasks. It has been fine-tuned on the "News Summary" dataset to generate concise summaries of news articles.
Architecture
The T5 model, introduced in the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer," explores transfer learning techniques in NLP. This model transforms every language problem into a text-to-text format, allowing a unified approach to tasks like summarization, question answering, and text classification. By leveraging the "Colossal Clean Crawled Corpus" for pre-training, the T5 model achieves state-of-the-art results across various benchmarks.
Training
The T5-Base-Finetuned-Summarize-News model was trained on a dataset compiled from various news sources, including Inshorts, Hindu, Indian Times, and Guardian. The dataset contains 4515 examples with metadata such as author name, headlines, article URLs, short texts, and complete articles. The model was trained for six epochs using a modified version of a training script provided by Abhishek Kumar Mishra.
Guide: Running Locally
To run the T5-Base-Finetuned-Summarize-News model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the necessary libraries installed. You can install the
transformers
library using pip:pip install transformers
-
Load the Model and Tokenizer: Use the
transformers
library to load the model and tokenizer:from transformers import AutoModelWithLMHead, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-summarize-news") model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")
-
Summarize Text: Define a function to generate summaries:
def summarize(text, max_length=150): input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True) generated_ids = model.generate(input_ids=input_ids, num_beams=2, max_length=max_length, repetition_penalty=2.5, length_penalty=1.0, early_stopping=True) preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids] return preds[0]
-
Use a Cloud GPU: For efficient processing, especially with large datasets or longer texts, consider using a cloud GPU service such as Google Colab or AWS.
License
The model and associated code are available under a license that allows for use and distribution, mentioning credit to the original creator, Manuel Romero, and the contributors. Always review the specific licensing terms provided in the repository or documentation.