rut5 base multitask

cointegrated

Introduction

The RUT5-BASE-MULTITASK model is a smaller version of Google's MT5-BASE, focusing on Russian and English embeddings. It has been fine-tuned for a variety of text-based tasks including translation, paraphrasing, and dialogue generation.

Architecture

This model is based on the T5 architecture, designed for text-to-text tasks. It utilizes PyTorch and is compatible with JAX and Safetensors. It supports both Russian and English languages.

Training

The model has been fine-tuned for several specific tasks:

  • Translation (ru-en, en-ru)
  • Paraphrasing
  • Text gap filling
  • Text assembly from unordered words
  • Text simplification
  • Dialogue response generation
  • Open-book question answering
  • Question generation about a text
  • News headline generation

Each task is specified with a task name followed by the input text using the | separator.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install dependencies:

    !pip install transformers sentencepiece
    
  2. Load the model:

    import torch
    from transformers import T5ForConditionalGeneration, T5Tokenizer
    
    tokenizer = T5Tokenizer.from_pretrained("cointegrated/rut5-base-multitask")
    model = T5ForConditionalGeneration.from_pretrained("cointegrated/rut5-base-multitask")
    
  3. Define a generation function:

    def generate(text, **kwargs):
        inputs = tokenizer(text, return_tensors='pt')
        with torch.no_grad():
            hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
        return tokenizer.decode(hypotheses[0], skip_special_tokens=True)
    
  4. Run the model on a task:

    print(generate('translate ru-en | Каждый охотник желает знать, где сидит фазан.'))
    # Output: Each hunter wants to know, where he is.
    

For enhanced performance, consider using cloud GPUs such as those offered by Google Cloud, AWS, or Azure.

License

The model is licensed under the MIT License, allowing for flexible use and modification.

More Related APIs in Text2text Generation