multilingual sentiment analysis

tabularisai

Introduction

The multilingual sentiment analysis model by TabularisAI is a fine-tuned version of distilbert/distilbert-base-multilingual-cased. It is designed to perform text classification, specifically sentiment analysis, across 22 languages. The model caters to various applications, including social media analysis, customer feedback, and product reviews.

Architecture

  • Base Model: distilbert-base-multilingual-cased
  • Languages Supported: English, Chinese, Spanish, Hindi, Arabic, Bengali, Portuguese, Russian, Japanese, German, Malay, Telugu, Vietnamese, Korean, French, Turkish, Italian, Polish, Ukrainian, Tagalog, Dutch, and Swiss German.
  • Classes: Five sentiment categories (Very Negative, Negative, Neutral, Positive, Very Positive)

Training

The model has been fine-tuned using synthetic multilingual data over three epochs, achieving a high accuracy rate. The data ensures a broad coverage of sentiment expressions across different languages and cultures.

Guide: Running Locally

  1. Install Dependencies:

    pip install transformers torch
    
  2. Load the Model:

    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    model_name = "tabularisai/multilingual-sentiment-analysis"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    
  3. Run Sentiment Analysis:

    def predict_sentiment(texts):
        inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
        with torch.no_grad():
            outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
        return [sentiment_map[p] for p in torch.argmax(probabilities, dim=-1).tolist()]
    
  4. Suggest Using Cloud GPUs:

    • Consider using cloud platforms like AWS, Google Cloud, or Azure to leverage GPU resources for efficient processing.

License

The model is licensed under CC BY-NC 4.0, allowing for non-commercial use with appropriate attribution.

More Related APIs in Text Classification