controlnet canny sdxl 1.0

diffusers

Introduction

The ControlNet-Canny-SDXL-1.0 model utilizes ControlNet weights trained on the stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. It is designed for text-to-image generation, offering various prompts to produce realistic and detailed images.

Architecture

The model is based on the Stable Diffusion XL architecture and leverages ControlNet for enhanced image generation capabilities. It uses the canny edge detection algorithm to provide conditioning to the ControlNet model, improving control over the generated content.

Training

The model was trained using a two-phase approach:

  • Phase 1: 20,000 steps on laion 6a dataset resized to a maximum dimension of 384.
  • Phase 2: Additional 20,000 steps on the same dataset resized to a maximum dimension of 1024, filtered to contain only images with a minimum size of 1024 for enhanced quality.
  • Compute: Training was performed using one 8xA100 machine.
  • Batch Size: Data parallelism with a single GPU batch size of 8, resulting in a total batch size of 64.
  • Hyperparameters: Constant learning rate of 1e-4, scaled by batch size to 64e-4.
  • Precision: Mixed precision with fp16 was employed.

Guide: Running Locally

  1. Install Required Libraries:

    pip install accelerate transformers safetensors opencv-python diffusers
    
  2. Load and Run the Model:

    from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
    from diffusers.utils import load_image
    from PIL import Image
    import torch
    import numpy as np
    import cv2
    
    prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
    negative_prompt = 'low quality, bad quality, sketches'
    
    image = load_image("your_image_url_here.png")
    controlnet_conditioning_scale = 0.5
    
    controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16)
    vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
    pipe = StableDiffusionXLControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, torch_dtype=torch.float16)
    pipe.enable_model_cpu_offload()
    
    image = np.array(image)
    image = cv2.Canny(image, 100, 200)
    image = image[:, :, None]
    image = np.concatenate([image, image, image], axis=2)
    image = Image.fromarray(image)
    
    images = pipe(prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale).images
    
    images[0].save("output_image.png")
    
  3. Suggested Cloud GPU: For optimal performance, consider using cloud services like AWS, Google Cloud, or Azure with GPU instances such as NVIDIA A100.

License

The model is released under the OpenRAIL++ license, which allows for wide use in research and commercial applications with certain restrictions and obligations.

More Related APIs in Text To Image