english sarcasm detector

helinivan

Introduction

The English Sarcasm Detector is a text classification model designed to identify sarcasm in news article titles. It is built upon the bert-base-uncased model and fine-tuned using a dataset from Kaggle. The model distinguishes between sarcastic (label 1) and non-sarcastic (label 0) content.

Architecture

This sarcasm detector is based on the BERT architecture, specifically the bert-base-uncased variant. It utilizes the capabilities of Transformers and PyTorch to perform text classification tasks, offering robust predictions with high accuracy.

Training

The model is trained using the "News Headlines Dataset For Sarcasm Detection" available on Kaggle. The specific dataset used for training is helinivan/sarcasm_headlines_multilingual. The model achieves high performance with an F1 score of 92.38 and accuracy of 92.42.

Guide: Running Locally

  1. Install Dependencies: Ensure you have transformers and torch installed in your Python environment.

    pip install transformers torch
    
  2. Preprocess Data: Lowercase the text and remove punctuation to prepare it for tokenization.

  3. Load Model and Tokenizer: Use the helinivan/english-sarcasm-detector model path with AutoTokenizer and AutoModelForSequenceClassification.

    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    
    MODEL_PATH = "helinivan/english-sarcasm-detector"
    
    tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
    model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
    
  4. Tokenize Input: Tokenize your text input with appropriate padding and truncation.

  5. Make Predictions: Pass the tokenized data through the model to obtain predictions and confidence scores.

For optimal performance, consider utilizing cloud GPUs from providers like AWS or Google Cloud, which offer scalable computing resources for model inference.

License

The licensing information for the English Sarcasm Detector model is not specified in the provided content. For detailed licensing terms, refer to the official repository or model card on Hugging Face's platform.

More Related APIs in Text Classification