deberta v3 base prompt injection v2

protectai

Introduction

The DeBERTa-v3-base-prompt-injection-v2 model is a fine-tuned version of Microsoft's DeBERTa-v3-base, developed to detect and classify prompt injection attacks. These attacks manipulate language models to produce harmful or unintended responses. This model enhances security in language model applications by identifying such malicious interventions.

Architecture

  • Fine-tuned by: Protect AI
  • Model type: DeBERTa-v3-base
  • Language: English
  • License: Apache License 2.0
  • Base model: microsoft/deberta-v3-base

Training

The model was trained on a dataset compiled from various public sources, focusing on prompt injection attacks. Over 20 configurations were tested to optimize detection capabilities, utilizing different hyperparameters and dataset compositions. Evaluation metrics include accuracy, recall, precision, and F1 score, with the model achieving high performance on testing.

Evaluation Metrics

  • Training Performance:

    • Loss: 0.0036
    • Accuracy: 99.93%
    • Recall: 99.94%
    • Precision: 99.92%
    • F1: 99.93%
  • Post-Training Evaluation:

    • Accuracy: 95.25%
    • Precision: 91.59%
    • Recall: 99.74%
    • F1 Score: 95.49%

Guide: Running Locally

Basic Steps

  1. Install Transformers Library:

    pip install transformers
    
  2. Load and Use the Model:

    from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection-v2")
    model = AutoModelForSequenceClassification.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection-v2")
    
    classifier = pipeline(
      "text-classification",
      model=model,
      tokenizer=tokenizer,
      truncation=True,
      max_length=512,
      device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
    )
    
    print(classifier("Your prompt injection is here"))
    

Suggested Cloud GPUs

For optimal performance, consider using cloud GPUs such as AWS EC2 instances with GPU support, Google Cloud's AI Platform, or Azure's GPU-enabled virtual machines.

License

This model is licensed under the Apache License 2.0, allowing use, distribution, and modification under the specified terms.

More Related APIs in Text Classification