roberta large openai detector
openai-communityRoBERTa Large OpenAI Detector
Introduction
The RoBERTa Large OpenAI Detector is a classifier designed to detect whether text is generated by a GPT-2 model. It is a fine-tuned version of the RoBERTa large model, using outputs from the 1.5 billion parameter GPT-2 model. The detector helps research in synthetic text generation.
Architecture
The model is based on a fine-tuned transformer-based language model, specifically RoBERTa large, with 355 million parameters. It builds on the outputs of the GPT-2 model to classify text as either GPT-2 generated or not.
Training
The model was fine-tuned using outputs from the GPT-2 model, with training data comprising WebText and GPT-2 generated text. The developers focused on creating a robust detector that could classify generated texts accurately across different sampling methods.
Guide: Running Locally
To run the RoBERTa Large OpenAI Detector locally, follow these basic steps:
-
Install Dependencies: Ensure you have Python and the Hugging Face Transformers library installed.
pip install transformers
-
Load the Model: Use the Hugging Face library to load the model.
from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained("openai-community/roberta-large-openai-detector") tokenizer = AutoTokenizer.from_pretrained("openai-community/roberta-large-openai-detector")
-
Perform Inference: Tokenize your input text and pass it through the model to get predictions.
inputs = tokenizer("Your text here", return_tensors="pt") outputs = model(**inputs)
-
Use Cloud GPU: For large-scale inference or training, consider using cloud GPUs like AWS or Google Cloud for better performance.
License
The model is released under the MIT license, allowing for free use with few restrictions.