bert toxic comment classification LLM Model

Introduction

The BERT-Toxic-Comment-Classification model is a fine-tuned version of the BERT base model designed to classify toxic comments. It utilizes the bert-base-uncased model architecture and is implemented using the Transformers library in PyTorch.

Architecture

The model is based on the BERT architecture, specifically the bert-base-uncased version. It is adapted for text classification tasks to identify toxic comments by leveraging a pre-trained BERT model fine-tuned for this specific task. The model uses two labels for classification.

Training

The model was trained using data from the Kaggle competition "Jigsaw Unintended Bias in Toxicity Classification". Specifically, 90% of the train.csv dataset was utilized for training purposes. The model achieves an AUC of 0.95 on a held-out test set consisting of 1500 rows.

Guide: Running Locally

To use this model locally:

Install the Transformers library.

Use the following code snippet to load and run the model:

from transformers import BertForSequenceClassification, BertTokenizer, TextClassificationPipeline

model_path = "JungleLee/bert-toxic-comment-classification"
tokenizer = BertTokenizer.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path, num_labels=2)

pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer)
print(pipeline("You're a fucking nerd."))

Consider using cloud GPUs from platforms such as AWS, Google Cloud, or Azure for enhanced performance during inference.

License

This model is licensed under the Academic Free License v3.0 (AFL-3.0).

More Related APIs in Text Classification