A I image detector

umm-maybe

Introduction

This project is a proof-of-concept demonstration using a Vision Transformer (ViT) model to predict whether artistic images are AI-generated. Created in October 2022, it primarily targets artistic images and is not suited for detecting deepfake photos or general computer imagery. It can serve as an indicator for AI-generated images, especially when scores are high, suggesting further human expert evaluation.

Architecture

The model is based on a Vision Transformer (ViT) architecture and is designed for binary classification to determine the likelihood that an artistic image is AI-generated. The model was trained using outputs from earlier models and may still recognize outputs from newer models like Midjourney 5, SDXL, or DALLE-3.

Training

The model was trained using the AutoTrain tool with a focus on binary classification. It achieves a high level of accuracy and performance as indicated by its validation metrics:

  • Loss: 0.163
  • Accuracy: 0.942
  • Precision: 0.938
  • Recall: 0.978
  • AUC: 0.980
  • F1: 0.958

CO2 emissions for training amounted to 7.9405 grams.

Guide: Running Locally

To run the AI Image Detector model locally, follow these basic steps:

  1. Clone the repository from Hugging Face.
  2. Set up a Python environment with the necessary dependencies, including PyTorch and transformers.
  3. Download and prepare any required datasets or images for testing.
  4. Execute the model inference script to classify images.

For optimal performance, it is recommended to use cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

This model is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. Distribution and use in your web page, app, or service are allowed with proper attribution. However, using the model to evade AI image detection in text-to-image systems is prohibited as it constitutes a derivative work.

More Related APIs in Image Classification