rubert tiny sentiment balanced

cointegrated

Introduction

RUBERT-TINY-SENTIMENT-BALANCED is a fine-tuned version of the RUBERT-TINY model, designed for sentiment analysis of short Russian texts. The task is framed as a multiclass classification problem: negative, neutral, or positive sentiment.

Architecture

The model is based on the BERT architecture and utilizes the PyTorch library. It is specifically fine-tuned for text classification tasks in the Russian language, leveraging transformers and safetensors for efficient computation.

Training

The model was trained using datasets collected by Smetanin, converted into a three-class format. The training data was balanced by upsampling and downsampling to ensure equal representation of sources and classes. The training process is documented in a publicly available Colab notebook. The model was evaluated on a balanced test set, achieving varying Macro F1 scores across different datasets, such as 0.83 on SentiRuEval2016_banks and 0.98 on mokoron.

Guide: Running Locally

  1. Install Dependencies:
    Ensure you have the transformers and sentencepiece libraries installed:

    pip install transformers sentencepiece --quiet
    
  2. Load Model and Tokenizer:

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    
    model_checkpoint = 'cointegrated/rubert-tiny-sentiment-balanced'
    tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
    model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)
    if torch.cuda.is_available():
        model.cuda()
    
  3. Inference: Use the provided get_sentiment function to analyze text sentiment:

    def get_sentiment(text, return_type='label'):
        with torch.no_grad():
            inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True).to(model.device)
            proba = torch.sigmoid(model(**inputs).logits).cpu().numpy()[0]
        if return_type == 'label':
            return model.config.id2label[proba.argmax()]
        elif return_type == 'score':
            return proba.dot([-1, 0, 1])
        return proba
    
    text = 'Какая гадость эта ваша заливная рыба!'
    print(get_sentiment(text, 'label'))  # Example output: 'negative'
    
  4. Suggested Cloud GPUs:
    For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.

License

The model and its associated code are shared under a license that should be verified on the official Hugging Face model page or repository. Always ensure compliance with the license terms when using and distributing the model.

More Related APIs in Text Classification