mbart_ru_sum_gazeta LLM Model

Introduction

The MBART_RU_SUM_GAZETA is a model tailored for summarizing Russian news articles, particularly from Gazeta.ru. It is built using the mBART architecture and offers capabilities for text-to-text generation.

Architecture

The model is a ported version of a fairseq model specifically designed for summarization tasks. It utilizes the mBART architecture, which is well-suited for multilingual text processing.

Training

Training Data

Dataset: The model was trained on the Gazeta dataset, a collection of Russian news articles.

Training Procedure

Script: Utilized a Fairseq training script available in the repository.
Porting: Conducted porting using a Colab notebook, making it accessible for further experimentation and deployment.

Evaluation

The model was evaluated using metrics like ROUGE (R-1-f, R-2-f, R-L-f), chrF, METEOR, and BLEU. The evaluation demonstrated competitive performance, particularly on Gazeta.ru articles.

Guide: Running Locally

Setup:
- Install the transformers library from Hugging Face.
- Install PyTorch if not already done.

Code Example:

from transformers import MBartTokenizer, MBartForConditionalGeneration

model_name = "IlyaGusev/mbart_ru_sum_gazeta"
tokenizer = MBartTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)

article_text = "..."

input_ids = tokenizer(
    [article_text],
    max_length=600,
    padding="max_length",
    truncation=True,
    return_tensors="pt"
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    no_repeat_ngram_size=4
)[0]

summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)

Hardware Recommendation:
- For efficient processing, using cloud GPUs such as those from AWS, GCP, or Azure is recommended.

License

The MBART_RU_SUM_GAZETA model is released under the Apache-2.0 license, allowing for both personal and commercial use.

More Related APIs in Summarization