sklearn transformers

scikit-learn

Introduction

This repository demonstrates a proof-of-concept pipeline combining Hugging Face Transformers with scikit-learn classifiers. It utilizes Longformer embeddings with scikit-learn's Logistic Regression to perform sentiment analysis. The project incorporates the whatlies library for language modeling.

Architecture

The pipeline integrates the following components:

  • HFTransformersLanguage: Uses embeddings from the facebook/bart-base model.
  • Logistic Regression: A classifier from scikit-learn, trained on the embeddings to perform sentiment analysis.

Training

The model achieves a balanced performance with an accuracy of 0.87. Here are key metrics from the classification report:

  • Precision: 0.85 for class 0 and 0.89 for class 1.
  • Recall: 0.89 for class 0 and 0.85 for class 1.
  • F1-Score: 0.87 for both classes.

Guide: Running Locally

  1. Setup Environment: Ensure Python and pip are installed. It's recommended to use a virtual environment.
  2. Install Dependencies:
    pip install scikit-learn transformers whatlies
    
  3. Download and Prepare Data: Follow the tutorial notebook for dataset preparation.
  4. Execute Pipeline: Run the pipeline using the provided notebook or script to perform sentiment analysis.

Cloud GPUs

For enhanced performance and efficiency, especially with large datasets or models, consider using cloud GPUs from providers like AWS, GCP, or Azure.

License

This project is licensed under the Apache-2.0 License.

More Related APIs in Text Classification