twitter xlm roberta base sentiment

cardiffnlp

Introduction

TWITTER-XLM-ROBERTA-BASE is a multilingual model based on XLM-RoBERTa, designed for sentiment analysis on tweets. It has been trained on approximately 198 million tweets and fine-tuned for sentiment analysis across eight languages: Arabic, English, French, German, Hindi, Italian, Spanish, and Portuguese. The model is part of the XLM-T toolkit and is integrated with the TweetNLP library. Further details can be found in the paper titled "XLM-T: A Multilingual Language Model Toolkit for Twitter."

Architecture

The model utilizes a multilingual XLM-RoBERTa-base architecture, which is known for its robust performance across various languages. This architecture allows the model to perform sentiment analysis effectively on social media data, accommodating the nuances of different languages and dialects.

Training

The training involved extensive data collection from Twitter, amounting to roughly 198 million tweets. The model was specifically fine-tuned for sentiment analysis, leveraging the diverse linguistic data to enhance its performance on multilingual sentiment classification tasks.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install the Transformers library:

    pip install transformers
    
  2. Load the model and tokenizer:

    from transformers import pipeline
    model_path = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
    sentiment_task = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)
    
  3. Perform sentiment analysis:

    sentiment_task("T'estimo!")
    

Cloud GPUs

For optimal performance, especially with large datasets, consider using cloud-based GPUs like those offered by AWS, Google Cloud, or Azure.

License

The model is hosted on Hugging Face's Model Hub and adheres to their licensing terms. Ensure to review these terms when using the model for different applications.

More Related APIs in Text Classification