deberta v3 base prompt injection v2
protectaiIntroduction
The DeBERTa-v3-base-prompt-injection-v2 model is a fine-tuned version of Microsoft's DeBERTa-v3-base, developed to detect and classify prompt injection attacks. These attacks manipulate language models to produce harmful or unintended responses. This model enhances security in language model applications by identifying such malicious interventions.
Architecture
- Fine-tuned by: Protect AI
- Model type: DeBERTa-v3-base
- Language: English
- License: Apache License 2.0
- Base model: microsoft/deberta-v3-base
Training
The model was trained on a dataset compiled from various public sources, focusing on prompt injection attacks. Over 20 configurations were tested to optimize detection capabilities, utilizing different hyperparameters and dataset compositions. Evaluation metrics include accuracy, recall, precision, and F1 score, with the model achieving high performance on testing.
Evaluation Metrics
-
Training Performance:
- Loss: 0.0036
- Accuracy: 99.93%
- Recall: 99.94%
- Precision: 99.92%
- F1: 99.93%
-
Post-Training Evaluation:
- Accuracy: 95.25%
- Precision: 91.59%
- Recall: 99.74%
- F1 Score: 95.49%
Guide: Running Locally
Basic Steps
-
Install Transformers Library:
pip install transformers
-
Load and Use the Model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline import torch tokenizer = AutoTokenizer.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection-v2") model = AutoModelForSequenceClassification.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection-v2") classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer, truncation=True, max_length=512, device=torch.device("cuda" if torch.cuda.is_available() else "cpu"), ) print(classifier("Your prompt injection is here"))
Suggested Cloud GPUs
For optimal performance, consider using cloud GPUs such as AWS EC2 instances with GPU support, Google Cloud's AI Platform, or Azure's GPU-enabled virtual machines.
License
This model is licensed under the Apache License 2.0, allowing use, distribution, and modification under the specified terms.