twitter xlm roberta base sentiment
cardiffnlpIntroduction
TWITTER-XLM-ROBERTA-BASE is a multilingual model based on XLM-RoBERTa, designed for sentiment analysis on tweets. It has been trained on approximately 198 million tweets and fine-tuned for sentiment analysis across eight languages: Arabic, English, French, German, Hindi, Italian, Spanish, and Portuguese. The model is part of the XLM-T toolkit and is integrated with the TweetNLP library. Further details can be found in the paper titled "XLM-T: A Multilingual Language Model Toolkit for Twitter."
Architecture
The model utilizes a multilingual XLM-RoBERTa-base architecture, which is known for its robust performance across various languages. This architecture allows the model to perform sentiment analysis effectively on social media data, accommodating the nuances of different languages and dialects.
Training
The training involved extensive data collection from Twitter, amounting to roughly 198 million tweets. The model was specifically fine-tuned for sentiment analysis, leveraging the diverse linguistic data to enhance its performance on multilingual sentiment classification tasks.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install the Transformers library:
pip install transformers
-
Load the model and tokenizer:
from transformers import pipeline model_path = "cardiffnlp/twitter-xlm-roberta-base-sentiment" sentiment_task = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)
-
Perform sentiment analysis:
sentiment_task("T'estimo!")
Cloud GPUs
For optimal performance, especially with large datasets, consider using cloud-based GPUs like those offered by AWS, Google Cloud, or Azure.
License
The model is hosted on Hugging Face's Model Hub and adheres to their licensing terms. Ensure to review these terms when using the model for different applications.