distilroberta finetuned stereotype detection
NarrativaIntroduction
The DistilRoBERTa model fine-tuned for stereotype detection is a variant of the distilroberta-base model. It is designed for text classification tasks, specifically targeting stereotype and gender bias detection. The model achieves a high accuracy of 98.92% on its evaluation set.
Architecture
This model is based on the DistilRoBERTa architecture, which is a distilled version of RoBERTa, optimized for efficiency while retaining much of the original model's performance capabilities. The model is implemented using the Transformers library and is compatible with PyTorch for execution.
Training
The model was fine-tuned using the following hyperparameters:
- Learning Rate: 2e-05
- Train Batch Size: 16
- Evaluation Batch Size: 16
- Seed: 42
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- Learning Rate Scheduler Type: Linear
- Number of Epochs: 5
Training results showed a decrease in loss and an increase in accuracy over the epochs, demonstrating effective learning.
Guide: Running Locally
-
Setup Environment:
- Install PyTorch and Transformers (
pip install torch transformers
). - Ensure your environment has the required versions, such as Transformers 4.10.2, PyTorch 1.9.0+cu102, Datasets 1.11.0, and Tokenizers 0.10.3.
- Install PyTorch and Transformers (
-
Load Model:
- Use the Transformers library to load the model and tokenizer.
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification model_name = "Narrativa/distilroberta-finetuned-stereotype-detection" tokenizer = DistilBertTokenizer.from_pretrained(model_name) model = DistilBertForSequenceClassification.from_pretrained(model_name)
-
Inference:
- Prepare input text and use the tokenizer to preprocess it.
- Pass the tokenized input to the model for inference.
-
Cloud GPUs:
- For performance optimization, especially during training or batch inference, consider using cloud-based GPU services like AWS EC2 GPU instances, Google Cloud GPUs, or Azure GPU services.
License
The model is licensed under the Apache 2.0 License, allowing for both commercial and non-commercial use with proper attribution.