mt5 base wikinewssum portuguese

airKlizz

Introduction

The MT5-BASE-WIKINEWSSUM-PORTUGUESE model is a fine-tuned version of Google's mT5-base, optimized for summarization tasks in Portuguese. The model has been trained to generate concise summaries from text inputs, and it achieves various ROUGE scores indicating its summarization performance.

Architecture

The model is based on the mT5 architecture, which is a multilingual variant of the T5 (Text-to-Text Transfer Transformer) model. This architecture allows for text-to-text generation tasks and is particularly suited for multilingual applications, leveraging a transformer-based framework.

Training

Training Procedure

The model was trained with the following hyperparameters:

  • Learning Rate: 5.6e-05
  • Training Batch Size: 4
  • Evaluation Batch Size: 4
  • Seed: 42
  • Gradient Accumulation Steps: 2
  • Total Train Batch Size: 8
  • Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
  • LR Scheduler Type: Linear
  • Number of Epochs: 8

Training Results

Training metrics were recorded across 8 epochs, with final results as follows:

  • Loss: 2.0428
  • Rouge1: 9.4966
  • Rouge2: 4.2224
  • Rougel: 7.9845
  • Rougelsum: 8.8641

Guide: Running Locally

  1. Setup Environment: Ensure you have the required framework versions installed:

    • Transformers 4.13.0
    • PyTorch 1.10.1
    • Datasets 1.16.1
    • Tokenizers 0.10.3
  2. Install Dependencies: Use pip to install necessary libraries:

    pip install transformers==4.13.0 torch==1.10.1 datasets==1.16.1 tokenizers==0.10.3
    
  3. Load the Model: Use the Hugging Face Transformers library to load the model:

    from transformers import MT5ForConditionalGeneration, MT5Tokenizer
    
    model = MT5ForConditionalGeneration.from_pretrained("airKlizz/mt5-base-wikinewssum-portuguese")
    tokenizer = MT5Tokenizer.from_pretrained("airKlizz/mt5-base-wikinewssum-portuguese")
    
  4. Run Inference: Input your text and generate summaries:

    input_text = "Your text to summarize here"
    input_ids = tokenizer.encode(input_text, return_tensors="pt")
    summary_ids = model.generate(input_ids)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    
  5. Suggest Cloud GPUs: For improved performance, consider using cloud-based GPU services like AWS EC2, Google Cloud's AI Platform, or Azure Machine Learning.

License

The model is released under the Apache-2.0 license, which allows for both personal and commercial use, distribution, modification, and private use, provided that the license terms are followed.

More Related APIs in Summarization