amazon review sentiment analysis
LiYuanIntroduction
This project involves a sentiment analysis model fine-tuned from a pre-existing BERT-based model. It is designed to analyze product reviews from Amazon and predict sentiment as a star rating between 1 and 5. The model supports six languages: English, Dutch, German, French, Spanish, and Italian.
Architecture
The model is based on distilbert-base-uncased
, which is a lightweight version of BERT. It is fine-tuned specifically for sentiment analysis on Amazon reviews. The model architecture includes a classification head added on top of the pre-trained BERT model to predict sentiment ratings.
Training
The model was fine-tuned using a dataset of Amazon US Customer Reviews. The training process involved replacing the model's head and training on 17,280 samples with a validation set of 4,320 samples. It was tested on a separate set of 2,400 samples. Key hyperparameters included a learning rate of 2e-05
, batch sizes of 16, and a training duration of 2 epochs. The optimizer used was Adam, with a linear learning rate scheduler.
Guide: Running Locally
To run the sentiment analysis model locally, follow these steps:
- Install Dependencies: Ensure you have Python and PyTorch installed.
- Install Transformers: Run
pip install transformers
. - Download Model: Use the following code snippet:
from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("LiYuan/amazon-review-sentiment-analysis") model = AutoModelForSequenceClassification.from_pretrained("LiYuan/amazon-review-sentiment-analysis")
- Prepare Data: Acquire the Amazon review dataset from Kaggle.
- Inference: Use the tokenizer and model to predict sentiments.
For faster performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The project is licensed under the Apache-2.0 License, allowing for open-source use and distribution.