stable diffusion xl refiner 1.0 LLM Model

Introduction

The Stable Diffusion XL Refiner 1.0 by Stability AI is a diffusion-based text-to-image generative model. It is designed to enhance images generated from text prompts, using a two-stage process that refines the output of an initial base model.

Architecture

The model employs a pipeline architecture involving a base model to generate initial, noisy latents. These are refined using a specialized model for the final denoising steps. The process can also utilize SDEdit for high-resolution outputs. The model employs two fixed, pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.

Training

The model is part of Stability AI's generative models, implemented with popular diffusion frameworks. The open-source project encourages contributions and offers continual updates with new functionalities like distillation.

Guide: Running Locally

Environment Setup: Ensure you have Python and pip installed.

Install Required Libraries:

pip install diffusers --upgrade
pip install invisible_watermark transformers accelerate safetensors

Load the Model:

import torch
from diffusers import StableDiffusionXLImg2ImgPipeline
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe = pipe.to("cuda")

Inference:

from diffusers.utils import load_image
url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"
init_image = load_image(url).convert("RGB")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt, image=init_image).images

Optimization: If using torch >= 2.0, enhance speed with:

pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

GPU Offloading: For limited VRAM:
```
pipe.enable_model_cpu_offload()
```

For enhanced performance, consider using cloud GPUs from providers like AWS or Google Cloud.

License

The model is licensed under the CreativeML Open RAIL++-M License, which permits use for research and non-commercial purposes.

More Related APIs in Image To Image