imagegpt small

openai

Introduction

ImageGPT (iGPT) is a transformer decoder model, inspired by GPT, that has been pre-trained on the ImageNet-21k dataset, consisting of 14 million images across 21,843 classes. The model is designed to predict the next pixel value given previous ones, enabling it to learn an inner representation of images. This can be used for feature extraction or conditional image generation.

Architecture

ImageGPT employs a transformer decoder architecture similar to GPT-2, adapted for image data. It processes images at a resolution of 32x32 pixels, converting them into a sequence of pixel values through color clustering. Each pixel is turned into one of 512 possible cluster values, forming a sequence of 1024 tokens.

Training

The model was trained on the ImageNet-21k dataset, which contains 14 million images and 21,843 classes. Images were preprocessed by resizing to 32x32 pixels and performing color clustering. The training procedure details are available in the original paper by Chen et al., with section 3.4 of v2 providing specific insights into the pretraining process.

Guide: Running Locally

To use ImageGPT for unconditional image generation in PyTorch, follow these steps:

  1. Install Dependencies: Ensure you have Python installed, along with PyTorch and the transformers library.
  2. Load the Model: Use the ImageGPTImageProcessor and ImageGPTForCausalImageModeling classes from the transformers library.
  3. Set Device: Check for a GPU and set the model to use it if available.
  4. Generate Images: Initialize with a start-of-sequence token and generate pixel values.
  5. Visualize Images: Convert generated sequences back to images and display them using matplotlib.

For efficient processing, consider using cloud GPUs like AWS EC2, Google Cloud Platform, or Azure.

License

ImageGPT is licensed under the Apache-2.0 license.

More Related APIs