twitter roberta base sentiment
cardiffnlpIntroduction
The Twitter-RoBERTa-Base model is designed for sentiment analysis, leveraging the RoBERTa-base architecture. It has been trained on approximately 58 million tweets and fine-tuned using the TweetEval benchmark. This model is suitable for analyzing sentiment in English-language tweets. For multilingual sentiment analysis, a similar model is available, XLM-T.
Architecture
The model is based on the RoBERTa-base architecture, which is a robustly optimized BERT pretraining approach. It has been fine-tuned specifically for sentiment analysis tasks. The model classifies text into three sentiment labels: Negative, Neutral, and Positive.
Training
The model was trained using the TweetEval benchmark, a comprehensive framework for evaluating Twitter-related language models. The reference paper detailing this framework is "TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification."
Guide: Running Locally
-
Install Libraries: Ensure you have the
transformers
library installed.pip install transformers
-
Load Model and Tokenizer:
from transformers import AutoModelForSequenceClassification, AutoTokenizer import numpy as np from scipy.special import softmax MODEL = "cardiffnlp/twitter-roberta-base-sentiment" tokenizer = AutoTokenizer.from_pretrained(MODEL) model = AutoModelForSequenceClassification.from_pretrained(MODEL)
-
Preprocess Text: Convert usernames and URLs in the text to placeholders.
def preprocess(text): new_text = [] for t in text.split(" "): t = '@user' if t.startswith('@') and len(t) > 1 else t t = 'http' if t.startswith('http') else t new_text.append(t) return " ".join(new_text)
-
Perform Sentiment Analysis:
text = "Good night 😊" text = preprocess(text) encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) scores = output[0][0].detach().numpy() scores = softmax(scores)
-
Output Results: Interpret the sentiment scores.
labels = ['negative', 'neutral', 'positive'] ranking = np.argsort(scores)[::-1] for i in range(scores.shape[0]): l = labels[ranking[i]] s = scores[ranking[i]] print(f"{i+1}) {l} {np.round(float(s), 4)}")
Cloud GPUs: For faster inference, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure.
License
The model and its associated code are subject to the terms and conditions of the Hugging Face Model License. Ensure compliance with the license before using the model in your applications.