imagegpt small
openaiIntroduction
ImageGPT (iGPT) is a transformer decoder model, inspired by GPT, that has been pre-trained on the ImageNet-21k dataset, consisting of 14 million images across 21,843 classes. The model is designed to predict the next pixel value given previous ones, enabling it to learn an inner representation of images. This can be used for feature extraction or conditional image generation.
Architecture
ImageGPT employs a transformer decoder architecture similar to GPT-2, adapted for image data. It processes images at a resolution of 32x32 pixels, converting them into a sequence of pixel values through color clustering. Each pixel is turned into one of 512 possible cluster values, forming a sequence of 1024 tokens.
Training
The model was trained on the ImageNet-21k dataset, which contains 14 million images and 21,843 classes. Images were preprocessed by resizing to 32x32 pixels and performing color clustering. The training procedure details are available in the original paper by Chen et al., with section 3.4 of v2 providing specific insights into the pretraining process.
Guide: Running Locally
To use ImageGPT for unconditional image generation in PyTorch, follow these steps:
- Install Dependencies: Ensure you have Python installed, along with PyTorch and the
transformers
library. - Load the Model: Use the
ImageGPTImageProcessor
andImageGPTForCausalImageModeling
classes from thetransformers
library. - Set Device: Check for a GPU and set the model to use it if available.
- Generate Images: Initialize with a start-of-sequence token and generate pixel values.
- Visualize Images: Convert generated sequences back to images and display them using
matplotlib
.
For efficient processing, consider using cloud GPUs like AWS EC2, Google Cloud Platform, or Azure.
License
ImageGPT is licensed under the Apache-2.0 license.