Omni Gen v1
ShitaoIntroduction
OmniGen is a unified image generation model designed for generating diverse images from multi-modal prompts. Unlike traditional models, OmniGen simplifies the process by eliminating the need for additional network modules or preprocessing steps, similar to the language generation approach of GPT. This model supports various tasks like text-to-image generation, identity-preserving generation, image editing, and more.
Architecture
OmniGen is built to handle input from multiple modalities, automatically identifying features in input images according to text prompts without needing additional plugins or operations. This flexibility allows it to perform tasks such as subject-driven generation and image-conditioned generation.
Training
OmniGen can be fine-tuned using a provided script train.py
, which supports techniques like LoRA finetuning. The model can be trained with custom datasets, and various parameters can be adjusted to suit specific tasks. Detailed training instructions are available in the documentation.
Guide: Running Locally
To run OmniGen locally, follow these steps:
-
Installation:
- Clone the repository and install dependencies:
git clone https://github.com/staoxiao/OmniGen.git cd OmniGen pip install -e .
- Clone the repository and install dependencies:
-
Environment Setup:
- Optionally, create a new Conda environment to avoid conflicts:
conda create -n omnigen python=3.10.12 conda activate omnigen pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
- Optionally, create a new Conda environment to avoid conflicts:
-
Usage:
- Import and use the OmniGen pipeline for text-to-image conversion:
from OmniGen import OmniGenPipeline pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1") images = pipe(prompt="A curly-haired man in a red shirt is drinking tea.", height=1024, width=1024, guidance_scale=2.5, seed=0) images[0].save("example_t2i.png")
- Import and use the OmniGen pipeline for text-to-image conversion:
-
Cloud GPUs:
- For enhanced performance, consider using cloud GPU services like Google Colab, where you can set up and run OmniGen with the provided commands.
-
Additional Resources:
- If memory is an issue, set
offload_model=True
, and for more efficient running, refer to the inference documentation.
- If memory is an issue, set
License
OmniGen is released under the MIT License, allowing for wide use and modification. For more details, refer to the license file in the repository.