superdiff sd v1 4
superdiffIntroduction
The SuperDiffusion pipeline utilizes the Itô Density Estimator to superimpose different text prompts using the Stable Diffusion v1-4 model. This technique allows the merging of multiple concepts into a single image output using advanced diffusion models.
Architecture
The model leverages the base model CompVis/stable-diffusion-v1-4
and employs a custom pipeline for text-to-image generation. It utilizes multiple libraries, including PyTorch, Diffusers, Accelerate, and Transformers, to implement the superimposition of diffusion models.
Training
This approach is based on the paper "The Superposition of Diffusion Models Using the Itô Density Estimator," which details how different text prompts can be effectively merged using the Itô Density Estimator within a diffusion framework.
Guide: Running Locally
Requirements
To run the SuperDiffusion pipeline, the following packages and versions are needed:
- PyTorch 2.5.1
- Diffusers 0.32.1
- Accelerate 1.2.1
- Transformers 4.47.1
Install these packages using:
pip install torch
pip install diffusers accelerate transformers
Example Usage
from PIL import Image
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("superdiff/superdiff-sd-v1-4", custom_pipeline='pipeline', trust_remote_code=True)
output = pipeline("a flamingo", "a candy cane", seed=1, num_inference_steps=1000, batch_size=1)
image = Image.fromarray(output[0].cpu().numpy())
image.save("superdiff_output.png")
Parameters
prompt_1
andprompt_2
: Required text prompts for the concepts to be superimposed.seed
: Optional; used for reproducibility.num_inference_steps
: Optional; recommended to set at 1000.batch_size
: Optional; defaults to 1.lift
: Optional; biases generation towards one prompt.guidance_scale
: Optional; controls guidance scale.height
,width
: Optional; dimensions of generated images.
Performance
A batch size of 1 on an NVIDIA A40 GPU takes approximately 3 minutes and 30 seconds.
Suggested Cloud GPUs
Consider using cloud-based GPUs such as NVIDIA A40, A100, or V100 for efficient processing.
License
The model and related materials are subject to Hugging Face's licensing terms. Ensure compliance with the specified license when using or modifying the pipeline.