deberta v3 base prompt injection v2 LLM Model

Introduction

The DeBERTa-v3-base-prompt-injection-v2 model is a fine-tuned version of Microsoft's DeBERTa-v3-base, developed to detect and classify prompt injection attacks. These attacks manipulate language models to produce harmful or unintended responses. This model enhances security in language model applications by identifying such malicious interventions.

Architecture

Fine-tuned by: Protect AI
Model type: DeBERTa-v3-base
Language: English
License: Apache License 2.0
Base model: microsoft/deberta-v3-base

Training

The model was trained on a dataset compiled from various public sources, focusing on prompt injection attacks. Over 20 configurations were tested to optimize detection capabilities, utilizing different hyperparameters and dataset compositions. Evaluation metrics include accuracy, recall, precision, and F1 score, with the model achieving high performance on testing.

Evaluation Metrics

Training Performance:
- Loss: 0.0036
- Accuracy: 99.93%
- Recall: 99.94%
- Precision: 99.92%
- F1: 99.93%
Post-Training Evaluation:
- Accuracy: 95.25%
- Precision: 91.59%
- Recall: 99.74%
- F1 Score: 95.49%

Guide: Running Locally

Basic Steps

Install Transformers Library:
```
pip install transformers
```

Load and Use the Model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection-v2")
model = AutoModelForSequenceClassification.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection-v2")

classifier = pipeline(
  "text-classification",
  model=model,
  tokenizer=tokenizer,
  truncation=True,
  max_length=512,
  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

print(classifier("Your prompt injection is here"))

Suggested Cloud GPUs

For optimal performance, consider using cloud GPUs such as AWS EC2 instances with GPU support, Google Cloud's AI Platform, or Azure's GPU-enabled virtual machines.

License

This model is licensed under the Apache License 2.0, allowing use, distribution, and modification under the specified terms.

More Related APIs in Text Classification