twitter roberta base sentiment

cardiffnlp

Introduction

The Twitter-RoBERTa-Base model is designed for sentiment analysis, leveraging the RoBERTa-base architecture. It has been trained on approximately 58 million tweets and fine-tuned using the TweetEval benchmark. This model is suitable for analyzing sentiment in English-language tweets. For multilingual sentiment analysis, a similar model is available, XLM-T.

Architecture

The model is based on the RoBERTa-base architecture, which is a robustly optimized BERT pretraining approach. It has been fine-tuned specifically for sentiment analysis tasks. The model classifies text into three sentiment labels: Negative, Neutral, and Positive.

Training

The model was trained using the TweetEval benchmark, a comprehensive framework for evaluating Twitter-related language models. The reference paper detailing this framework is "TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification."

Guide: Running Locally

  1. Install Libraries: Ensure you have the transformers library installed.

    pip install transformers
    
  2. Load Model and Tokenizer:

    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    import numpy as np
    from scipy.special import softmax
    
    MODEL = "cardiffnlp/twitter-roberta-base-sentiment"
    tokenizer = AutoTokenizer.from_pretrained(MODEL)
    model = AutoModelForSequenceClassification.from_pretrained(MODEL)
    
  3. Preprocess Text: Convert usernames and URLs in the text to placeholders.

    def preprocess(text):
        new_text = []
        for t in text.split(" "):
            t = '@user' if t.startswith('@') and len(t) > 1 else t
            t = 'http' if t.startswith('http') else t
            new_text.append(t)
        return " ".join(new_text)
    
  4. Perform Sentiment Analysis:

    text = "Good night 😊"
    text = preprocess(text)
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)
    
  5. Output Results: Interpret the sentiment scores.

    labels = ['negative', 'neutral', 'positive']
    ranking = np.argsort(scores)[::-1]
    for i in range(scores.shape[0]):
        l = labels[ranking[i]]
        s = scores[ranking[i]]
        print(f"{i+1}) {l} {np.round(float(s), 4)}")
    

Cloud GPUs: For faster inference, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure.

License

The model and its associated code are subject to the terms and conditions of the Hugging Face Model License. Ensure compliance with the license before using the model in your applications.

More Related APIs in Text Classification