sklearn transformers
scikit-learnIntroduction
This repository demonstrates a proof-of-concept pipeline combining Hugging Face Transformers with scikit-learn classifiers. It utilizes Longformer embeddings with scikit-learn's Logistic Regression to perform sentiment analysis. The project incorporates the whatlies
library for language modeling.
Architecture
The pipeline integrates the following components:
- HFTransformersLanguage: Uses embeddings from the
facebook/bart-base
model. - Logistic Regression: A classifier from scikit-learn, trained on the embeddings to perform sentiment analysis.
Training
The model achieves a balanced performance with an accuracy of 0.87. Here are key metrics from the classification report:
- Precision: 0.85 for class 0 and 0.89 for class 1.
- Recall: 0.89 for class 0 and 0.85 for class 1.
- F1-Score: 0.87 for both classes.
Guide: Running Locally
- Setup Environment: Ensure Python and pip are installed. It's recommended to use a virtual environment.
- Install Dependencies:
pip install scikit-learn transformers whatlies
- Download and Prepare Data: Follow the tutorial notebook for dataset preparation.
- Execute Pipeline: Run the pipeline using the provided notebook or script to perform sentiment analysis.
Cloud GPUs
For enhanced performance and efficiency, especially with large datasets or models, consider using cloud GPUs from providers like AWS, GCP, or Azure.
License
This project is licensed under the Apache-2.0 License.