roberta large openai detector

openai-community

RoBERTa Large OpenAI Detector

Introduction

The RoBERTa Large OpenAI Detector is a classifier designed to detect whether text is generated by a GPT-2 model. It is a fine-tuned version of the RoBERTa large model, using outputs from the 1.5 billion parameter GPT-2 model. The detector helps research in synthetic text generation.

Architecture

The model is based on a fine-tuned transformer-based language model, specifically RoBERTa large, with 355 million parameters. It builds on the outputs of the GPT-2 model to classify text as either GPT-2 generated or not.

Training

The model was fine-tuned using outputs from the GPT-2 model, with training data comprising WebText and GPT-2 generated text. The developers focused on creating a robust detector that could classify generated texts accurately across different sampling methods.

Guide: Running Locally

To run the RoBERTa Large OpenAI Detector locally, follow these basic steps:

  1. Install Dependencies: Ensure you have Python and the Hugging Face Transformers library installed.

    pip install transformers
    
  2. Load the Model: Use the Hugging Face library to load the model.

    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    
    model = AutoModelForSequenceClassification.from_pretrained("openai-community/roberta-large-openai-detector")
    tokenizer = AutoTokenizer.from_pretrained("openai-community/roberta-large-openai-detector")
    
  3. Perform Inference: Tokenize your input text and pass it through the model to get predictions.

    inputs = tokenizer("Your text here", return_tensors="pt")
    outputs = model(**inputs)
    
  4. Use Cloud GPU: For large-scale inference or training, consider using cloud GPUs like AWS or Google Cloud for better performance.

License

The model is released under the MIT license, allowing for free use with few restrictions.

More Related APIs in Text Classification