autonlp Gibberish Detector 492513457

madhurjindal

Introduction

The project focuses on developing a gibberish detector for the English language, particularly useful for improving the accuracy and user experience of systems relying on text inputs, such as chatbots. The goal is to classify text as gibberish or non-gibberish, enhancing interaction quality by filtering out nonsensical input.

Architecture

The gibberish detector is based on the DistilBERT model, fine-tuned with AutoTrain for text classification tasks. It follows a multi-class classification approach, categorizing input text into distinct classes based on the level of gibberish detected.

Training

The model was trained using AutoNLP with a focus on reducing CO2 emissions during training, achieving a total of 5.53 grams. The validation metrics highlight high accuracy and precision, with an accuracy of 97.36% and a weighted F1 score of 97.36%.

Label Description

The model categorizes text into four classes:

  1. Noise: Random words without individual meaning.
  2. Word Salad: Words are meaningful individually but nonsensical collectively.
  3. Mild Gibberish: Contains grammatical or syntactical errors.
  4. Clean: Meaningful and coherent sentences.

Guide: Running Locally

Basic Steps

  1. Install Dependencies: Ensure Python and PyTorch are installed. Use pip to install the Transformers library.

    pip install transformers torch
    
  2. Load Model and Tokenizer: Use the transformers library to load the model and tokenizer.

    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    
    model = AutoModelForSequenceClassification.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457")
    tokenizer = AutoTokenizer.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457")
    
  3. Inference: Tokenize input text and perform inference.

    inputs = tokenizer("Your input text here", return_tensors="pt")
    outputs = model(**inputs)
    
  4. Prediction: Use the softmax function to determine probabilities and classify the input.

    import torch.nn.functional as F
    
    probs = F.softmax(outputs.logits, dim=-1)
    predicted_index = torch.argmax(probs, dim=1).item()
    

Suggest Cloud GPUs

Consider using cloud services such as AWS, Google Cloud Platform, or Azure for access to high-performance GPUs, which can accelerate the model inference process.

License

The gibberish detector is released under the MIT License, allowing for wide use and modification of the codebase, provided attribution is maintained.

More Related APIs in Text Classification