nsfw_image_detection

Falconsai

Introduction

The Fine-Tuned Vision Transformer (ViT) is designed for NSFW image classification. It is based on the transformer encoder architecture and is adapted for image classification tasks using the ImageNet-21k dataset. The model is fine-tuned to distinguish between "normal" and "nsfw" content, making it suitable for filtering explicit or inappropriate images.

Architecture

This model utilizes the "google/vit-base-patch16-224-in21k" architecture, pre-trained on a large dataset of images resized to 224x224 pixels. It is optimized through careful adjustment of hyperparameters, such as a batch size of 16 and a learning rate of 5e-5, which enhances the model's ability to learn and refine its capabilities.

Training

The training process involved a proprietary dataset of 80,000 images categorized into "normal" and "nsfw" classes. This diverse dataset helped the model learn to differentiate between safe and explicit content. Training statistics include an evaluation loss of 0.0746 and an accuracy of 98.04%.

Guide: Running Locally

To use the model for NSFW image classification, follow these steps:

  1. Install Dependencies: Ensure you have transformers and torch installed, along with PIL for image handling.
  2. Load the Model:
    • Use a pipeline for a high-level approach:
      from PIL import Image
      from transformers import pipeline
      
      img = Image.open("<path_to_image_file>")
      classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")
      classifier(img)
      
    • Load the model directly for more control:
      import torch
      from PIL import Image
      from transformers import AutoModelForImageClassification, ViTImageProcessor
      
      img = Image.open("<path_to_image_file>")
      model = AutoModelForImageClassification.from_pretrained("Falconsai/nsfw_image_detection")
      processor = ViTImageProcessor.from_pretrained('Falconsai/nsfw_image_detection')
      with torch.no_grad():
          inputs = processor(images=img, return_tensors="pt")
          outputs = model(**inputs)
          logits = outputs.logits
      
      predicted_label = logits.argmax(-1).item()
      model.config.id2label[predicted_label]
      
  3. Run on Cloud GPUs: For efficient processing, consider using cloud GPU services like AWS, Google Cloud, or Azure.

License

This model is licensed under the Apache 2.0 License, allowing for broad use and modification while maintaining attribution and including a copy of the license with any redistributed versions.

More Related APIs in Image Classification