multilingual sentiment analysis
tabularisaiIntroduction
The multilingual sentiment analysis model by TabularisAI is a fine-tuned version of distilbert/distilbert-base-multilingual-cased. It is designed to perform text classification, specifically sentiment analysis, across 22 languages. The model caters to various applications, including social media analysis, customer feedback, and product reviews.
Architecture
- Base Model: distilbert-base-multilingual-cased
- Languages Supported: English, Chinese, Spanish, Hindi, Arabic, Bengali, Portuguese, Russian, Japanese, German, Malay, Telugu, Vietnamese, Korean, French, Turkish, Italian, Polish, Ukrainian, Tagalog, Dutch, and Swiss German.
- Classes: Five sentiment categories (Very Negative, Negative, Neutral, Positive, Very Positive)
Training
The model has been fine-tuned using synthetic multilingual data over three epochs, achieving a high accuracy rate. The data ensures a broad coverage of sentiment expressions across different languages and cultures.
Guide: Running Locally
-
Install Dependencies:
pip install transformers torch
-
Load the Model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification model_name = "tabularisai/multilingual-sentiment-analysis" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name)
-
Run Sentiment Analysis:
def predict_sentiment(texts): inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1) sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"} return [sentiment_map[p] for p in torch.argmax(probabilities, dim=-1).tolist()]
-
Suggest Using Cloud GPUs:
- Consider using cloud platforms like AWS, Google Cloud, or Azure to leverage GPU resources for efficient processing.
License
The model is licensed under CC BY-NC 4.0, allowing for non-commercial use with appropriate attribution.