toxic comment model
martin-haIntroduction
The toxic-comment-model is a fine-tuned version of the DistilBERT model, designed to classify toxic comments in English. It is implemented using the Transformers library and PyTorch framework.
Architecture
The model utilizes DistilBERT, a smaller, faster, and lighter version of BERT, optimized for text classification tasks.
Training
The model was trained using data from a Kaggle competition on unintended bias in toxicity classification. Only 10% of the train.csv
dataset was used for training. Detailed training procedures and code are available on GitHub, and the process took about 3 hours on a P-100 GPU. The model was evaluated on a test set of 10,000 rows, achieving a 94% accuracy and an F1-score of 0.59.
Guide: Running Locally
To use the model locally, follow these steps:
-
Install the Transformers library:
pip install transformers
-
Use the following code to load and run the model:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline model_path = "martin-ha/toxic-comment-model" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained(model_path) pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer) print(pipeline('This is a test text.'))
-
Cloud GPU Suggestion: For efficient training and inference, consider using cloud GPUs like AWS EC2 with P-100 or similar capabilities.
License
The model and its associated code are subject to the Hugging Face model license agreements. Ensure compliance with these licenses when using or distributing the model.