m T5_m2o_chinese_simplified_cross Sum LLM Model

Introduction

The mT5_M2O_CHINESE_SIMPLIFIED_CROSSSUM model is a many-to-one multilingual T5 checkpoint finetuned on various cross-lingual pairs using the CrossSum dataset. It aims to summarize text from any language into Simplified Chinese.

Architecture

The model is based on the mT5 architecture, which is a multilingual variant of the T5 model designed for text-to-text tasks. It can handle text inputs in 43 different languages, summarizing them into Simplified Chinese.

Training

The model was finetuned using the CrossSum dataset, which includes cross-lingual pairs with target summaries in Simplified Chinese. Detailed training scripts and methodologies can be found in the associated research paper and the official repository linked in the documentation.

Guide: Running Locally

To run the model locally, follow these steps:

Install the Transformers Library: Ensure you have the transformers library installed. It was tested on version 4.11.0.dev0.
```
pip install transformers
```

Import Required Modules:

import re
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

Define a Whitespace Handler:

WHITESPACE_HANDLER = lambda k: re.sub('\s+', ' ', re.sub('\n+', ' ', k.strip()))

Prepare Text for Summarization:
```
article_text = """Your text here"""
```

Load the Model and Tokenizer:

model_name = "csebuetnlp/mT5_m2o_chinese_simplified_crossSum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Tokenize and Generate Summary:

input_ids = tokenizer(
    [WHITESPACE_HANDLER(article_text)],
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    max_length=84,
    no_repeat_ngram_size=2,
    num_beams=4
)[0]

summary = tokenizer.decode(
    output_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(summary)

For optimal performance, consider using cloud GPUs such as AWS, GCP, or Azure.

License

This model is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-nc-sa-4.0).

More Related APIs in Summarization