Omni Gen v1

BAAI

Introduction

OmniGen is a unified image generation model designed to generate a wide range of images from multi-modal prompts. It aims to simplify the image generation process by eliminating the need for additional network modules and preprocessing steps. OmniGen allows for easy customization and fine-tuning, facilitating the creation of diverse and creative image-generation tasks.

Architecture

OmniGen operates without requiring additional plugins or operations, automatically identifying features in input images based on text prompts. It supports various tasks, including text-to-image generation, subject-driven generation, identity-preserving generation, image editing, and image-conditioned generation.

Training

OmniGen can be fine-tuned using the provided training script train.py, which supports techniques such as LoRA (Low-Rank Adaptation). Users can adjust parameters like learning rate, batch size, and dropout probability to optimize the model for specific tasks. Detailed instructions for fine-tuning are available in the documentation.

Guide: Running Locally

Basic Steps

  1. Clone the Repository and Install Dependencies:

    git clone https://github.com/staoxiao/OmniGen.git
    cd OmniGen
    pip install -e .
    
  2. Create a Python Environment:

    • Use Conda (recommended):
      conda create -n omnigen python=3.10.12
      conda activate omnigen
      
  3. Install PyTorch:

    • Install the appropriate version of PyTorch for your CUDA version:
      pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
      
  4. Run Examples:

    • Import and use the OmniGen pipeline:
      from OmniGen import OmniGenPipeline
      pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
      images = pipe(prompt="A curly-haired man in a red shirt is drinking tea.", height=1024, width=1024, guidance_scale=2.5, seed=0)
      images[0].save("example_t2i.png")
      
  5. Use Cloud GPUs:

    • For resource-intensive tasks, consider using cloud GPUs on platforms like Google Colab.

Additional Resources

  • For more examples and detailed instructions, refer to inference.ipynb and inference_demo.ipynb.
  • For efficient resource management, consult docs/inference.md.

License

This repository is licensed under the MIT License.

More Related APIs in Text To Image