ru T5 base detox

s-nlp

Introduction

The ruT5-base-detox is a detoxification model designed to transform Russian toxic messages into non-toxic ones. It is part of the RUSSE 2022 competition focused on Russian text detoxification using parallel corpora. The model is based on the ruT5 architecture.

Architecture

The ruT5-base-detox model is built upon the ruT5-base transformer architecture. It specifically targets Russian language text and is trained to perform text-to-text generation, transforming toxic content into more neutral language.

Training

This model was trained using datasets containing Russian toxic messages sourced from platforms such as Odnoklassniki, Pikabu, and Twitter. The training data comes from the "RUSSE 2022: Russian Text Detoxification Based on Parallel Corpora" competition.

Guide: Running Locally

  1. Install Dependencies: Ensure that you have PyTorch and the Transformers library installed.
  2. Load the Model and Tokenizer:
    from transformers import T5ForConditionalGeneration, AutoTokenizer
    
    base_model_name = 'ai-forever/ruT5-base'
    model_name = 's-nlp/ruT5-base-detox'
    
    tokenizer = AutoTokenizer.from_pretrained(base_model_name)
    model = T5ForConditionalGeneration.from_pretrained(model_name)
    
  3. Prepare Input and Generate Output:
    input_ids = tokenizer.encode('Это полная хуйня!', return_tensors='pt')
    output_ids = model.generate(input_ids, max_length=50, num_return_sequences=1)
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    print(output_text)  # Output: Это полный бред!
    
  4. Cloud GPUs: For enhanced performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

This model is distributed under the OpenRAIL++ License, promoting the development of technology that benefits the public.

More Related APIs in Text2text Generation