Introduction

Infinity is a Bitwise Visual AutoRegressive Model designed to generate high-resolution and photorealistic images. By employing a bitwise token prediction framework, it utilizes an infinite-vocabulary tokenizer and classifier alongside bitwise self-correction. This approach allows for the theoretical scaling of the tokenizer vocabulary size and the transformer size, significantly enhancing its scaling capabilities. Infinity achieves superior performance compared to leading diffusion models like SD3-Medium and SDXL, improving benchmark scores and demonstrating faster image generation speeds.

Architecture

Infinity's architecture is centered around a bitwise token prediction framework, which includes an infinite-vocabulary tokenizer and a classifier with bitwise self-correction. This design allows the model to scale its tokenizer vocabulary size and transformer size concurrently, enabling unprecedented scaling capabilities for high-resolution image synthesis.

Training

Infinity is trained to excel in autoregressive text-to-image generation tasks, setting new records in benchmarks by outperforming existing models such as SD3-Medium. It leverages its unique architecture to improve GenEval and ImageReward benchmark scores significantly, demonstrating its capacity to produce high-quality images efficiently.

Guide: Running Locally

  1. Clone the Repository:

    git clone https://github.com/FoundationVision/Infinity
    cd Infinity
    
  2. Install Dependencies: Make sure to install all necessary libraries and packages listed in the requirements.txt file.

  3. Download Model Weights: The model weights are hosted on Hugging Face. Download them as needed for local execution.

  4. Run the Model: Execute the script to run Infinity locally, following any additional instructions provided in the repository.

  5. Use Cloud GPUs: For optimal performance, consider using cloud-based GPUs such as AWS EC2 with NVIDIA GPUs, Google Cloud's AI Platform, or Azure's Machine Learning services.

License

Infinity is released under the MIT License, which permits reuse, distribution, and modification, provided that the original authors are credited.

More Related APIs