m T5_multilingual_ X L Sum
csebuetnlpIntroduction
The mT5_multilingual_XLSum
model is a multilingual summarization model based on the mT5 architecture. It has been fine-tuned on the XL-Sum dataset, which encompasses 45 languages, to perform text summarization tasks.
Architecture
The model is a variant of the mT5, which is a multilingual version of Google's T5 (Text-to-Text Transfer Transformer) model. mT5 is designed to handle text-to-text transformations across multiple languages, making it suitable for tasks like summarization in diverse linguistic contexts.
Training
The model was trained using the XL-Sum dataset. Detailed training scripts and methodologies are documented in the associated research paper and the official GitHub repository. The model performance is evaluated using standard metrics like ROUGE scores across different languages.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Transformers Library: Ensure you have the
transformers
library installed.pip install transformers
-
Load the Model and Tokenizer: Use the
transformers
library to load the model and tokenizer.from transformers import AutoTokenizer, AutoModelForSeq2SeqLM model_name = "csebuetnlp/mT5_multilingual_XLSum" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
-
Prepare Input Text: Handle whitespace and prepare your input text.
import re WHITESPACE_HANDLER = lambda k: re.sub('\s+', ' ', re.sub('\n+', ' ', k.strip())) article_text = "Your text here..."
-
Generate Summary: Tokenize the input and generate a summary.
input_ids = tokenizer( [WHITESPACE_HANDLER(article_text)], return_tensors="pt", padding="max_length", truncation=True, max_length=512 )["input_ids"] output_ids = model.generate( input_ids=input_ids, max_length=84, no_repeat_ngram_size=2, num_beams=4 )[0] summary = tokenizer.decode( output_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(summary)
-
Cloud GPUs: For performance improvements, especially with large datasets, consider using cloud-based GPU platforms like AWS EC2, Google Cloud Platform, or Azure.
License
The model is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). This license allows sharing and adapting the model for non-commercial purposes, provided appropriate credit is given and adaptations are shared under similar terms.