toxic comment model

martin-ha

Introduction

The toxic-comment-model is a fine-tuned version of the DistilBERT model, designed to classify toxic comments in English. It is implemented using the Transformers library and PyTorch framework.

Architecture

The model utilizes DistilBERT, a smaller, faster, and lighter version of BERT, optimized for text classification tasks.

Training

The model was trained using data from a Kaggle competition on unintended bias in toxicity classification. Only 10% of the train.csv dataset was used for training. Detailed training procedures and code are available on GitHub, and the process took about 3 hours on a P-100 GPU. The model was evaluated on a test set of 10,000 rows, achieving a 94% accuracy and an F1-score of 0.59.

Guide: Running Locally

To use the model locally, follow these steps:

  1. Install the Transformers library:

    pip install transformers
    
  2. Use the following code to load and run the model:

    from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
    
    model_path = "martin-ha/toxic-comment-model"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForSequenceClassification.from_pretrained(model_path)
    
    pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer)
    print(pipeline('This is a test text.'))
    
  3. Cloud GPU Suggestion: For efficient training and inference, consider using cloud GPUs like AWS EC2 with P-100 or similar capabilities.

License

The model and its associated code are subject to the Hugging Face model license agreements. Ensure compliance with these licenses when using or distributing the model.

More Related APIs in Text Classification