rubert base cased sentiment

blanchefort

RUBERT-BASE-CASED-SENTIMENT

Introduction

RuBERT-Base-Cased-Sentiment is a model designed for sentiment analysis of short Russian texts. It is based on the DeepPavlov/rubert-base-cased-conversational model and has been trained on a large corpus consisting of 351,797 texts. The model classifies sentiments into three categories: Neutral, Positive, and Negative.

Architecture

The model uses the BERT architecture implemented in PyTorch, and it is compatible with TensorFlow and JAX as well. It leverages the capabilities of the transformers library to perform text classification, specifically sentiment analysis, in the Russian language.

Training

The model was trained using several datasets:

  • RuTweetCorp: A corpus built for training sentiment classifiers from microblog posts.
  • RuReviews: An annotated sentiment analysis dataset for Russian product reviews.
  • RuSentiment: A dataset for sentiment analysis in Russian social media.
  • Отзывы о медучреждениях: A dataset containing reviews of medical institutions collected from prodoctorov.ru.

Guide: Running Locally

To use the model locally, follow these steps:

  1. Install PyTorch and Transformers: Ensure you have the PyTorch library and the transformers package installed.
  2. Load the Model and Tokenizer:
    import torch
    from transformers import AutoModelForSequenceClassification, BertTokenizerFast
    
    tokenizer = BertTokenizerFast.from_pretrained('blanchefort/rubert-base-cased-sentiment')
    model = AutoModelForSequenceClassification.from_pretrained('blanchefort/rubert-base-cased-sentiment', return_dict=True)
    
  3. Define the Prediction Function:
    @torch.no_grad()
    def predict(text):
        inputs = tokenizer(text, max_length=512, padding=True, truncation=True, return_tensors='pt')
        outputs = model(**inputs)
        predicted = torch.nn.functional.softmax(outputs.logits, dim=1)
        predicted = torch.argmax(predicted, dim=1).numpy()
        return predicted
    
  4. Use Cloud GPUs: For larger datasets or faster processing, consider using cloud-based GPUs like those offered by AWS, GCP, or Azure.

License

The model and datasets used are subject to their respective licenses. Ensure compliance with all licensing terms when using or distributing the model and datasets.

More Related APIs in Text Classification