distilbert political tweets
m-newhauserIntroduction
DISTILBERT-POLITICAL-TWEETS is a text classification model designed to identify Democratic and Republican sentiments in tweets. It is a fine-tuned version of distilbert-base-uncased
and was trained on tweets by US senators from 2021. The model achieves an accuracy of 90.76% and an F1 score of 91.17% on the evaluation set.
Architecture
The model is based on the DistilBERT architecture, specifically the distilbert-base-uncased
variant. It is configured to classify short texts, like tweets, based on political sentiment, distinguishing between Democratic and Republican views.
Training
The model was fine-tuned using the following hyperparameters:
- Optimizer: Adam
- Training Precision: float32
- Learning Rate: 5e-5
- Number of Epochs: 5
The training was conducted on 99,693 tweets, with a roughly balanced distribution between Democratic and Republican content.
Framework Versions
- Transformers: 4.16.2
- TensorFlow: 2.8.0
- Datasets: 1.18.3
- Tokenizers: 0.11.6
Guide: Running Locally
- Clone the Repository: Clone the model repository from Hugging Face.
- Install Dependencies: Ensure you have the following Python packages installed:
transformers
,tensorflow
,datasets
, andtokenizers
. - Load the Model: Use the
transformers
library to load the model and tokenizer. - Prepare Data: Format your input data similar to the tweets used for training.
- Inference: Run the model on your data to classify political sentiment.
For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure, which offer GPU instances.
License
The model is licensed under the LGPL-3.0 license, allowing for free usage and modification with certain conditions.