Omni Gen v1

Shitao

Introduction

OmniGen is a unified image generation model designed for generating diverse images from multi-modal prompts. Unlike traditional models, OmniGen simplifies the process by eliminating the need for additional network modules or preprocessing steps, similar to the language generation approach of GPT. This model supports various tasks like text-to-image generation, identity-preserving generation, image editing, and more.

Architecture

OmniGen is built to handle input from multiple modalities, automatically identifying features in input images according to text prompts without needing additional plugins or operations. This flexibility allows it to perform tasks such as subject-driven generation and image-conditioned generation.

Training

OmniGen can be fine-tuned using a provided script train.py, which supports techniques like LoRA finetuning. The model can be trained with custom datasets, and various parameters can be adjusted to suit specific tasks. Detailed training instructions are available in the documentation.

Guide: Running Locally

To run OmniGen locally, follow these steps:

  1. Installation:

    • Clone the repository and install dependencies:
      git clone https://github.com/staoxiao/OmniGen.git
      cd OmniGen
      pip install -e .
      
  2. Environment Setup:

    • Optionally, create a new Conda environment to avoid conflicts:
      conda create -n omnigen python=3.10.12
      conda activate omnigen
      pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
      
  3. Usage:

    • Import and use the OmniGen pipeline for text-to-image conversion:
      from OmniGen import OmniGenPipeline
      pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
      images = pipe(prompt="A curly-haired man in a red shirt is drinking tea.", height=1024, width=1024, guidance_scale=2.5, seed=0)
      images[0].save("example_t2i.png")
      
  4. Cloud GPUs:

    • For enhanced performance, consider using cloud GPU services like Google Colab, where you can set up and run OmniGen with the provided commands.
  5. Additional Resources:

    • If memory is an issue, set offload_model=True, and for more efficient running, refer to the inference documentation.

License

OmniGen is released under the MIT License, allowing for wide use and modification. For more details, refer to the license file in the repository.

More Related APIs in Text To Image