stable diffusion xl refiner 1.0

stabilityai

Introduction

The Stable Diffusion XL Refiner 1.0 by Stability AI is a diffusion-based text-to-image generative model. It is designed to enhance images generated from text prompts, using a two-stage process that refines the output of an initial base model.

Architecture

The model employs a pipeline architecture involving a base model to generate initial, noisy latents. These are refined using a specialized model for the final denoising steps. The process can also utilize SDEdit for high-resolution outputs. The model employs two fixed, pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.

Training

The model is part of Stability AI's generative models, implemented with popular diffusion frameworks. The open-source project encourages contributions and offers continual updates with new functionalities like distillation.

Guide: Running Locally

  1. Environment Setup: Ensure you have Python and pip installed.
  2. Install Required Libraries:
    pip install diffusers --upgrade
    pip install invisible_watermark transformers accelerate safetensors
    
  3. Load the Model:
    import torch
    from diffusers import StableDiffusionXLImg2ImgPipeline
    pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
    )
    pipe = pipe.to("cuda")
    
  4. Inference:
    from diffusers.utils import load_image
    url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"
    init_image = load_image(url).convert("RGB")
    prompt = "a photo of an astronaut riding a horse on mars"
    image = pipe(prompt, image=init_image).images
    
  5. Optimization: If using torch >= 2.0, enhance speed with:
    pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
    
  6. GPU Offloading: For limited VRAM:
    pipe.enable_model_cpu_offload()
    

For enhanced performance, consider using cloud GPUs from providers like AWS or Google Cloud.

License

The model is licensed under the CreativeML Open RAIL++-M License, which permits use for research and non-commercial purposes.

More Related APIs in Image To Image