controlnet canny sdxl 1.0

xinsir

Introduction

The controlnet-canny-sdxl-1.0 is a robust ControlNet model developed by xinsir. It produces high-resolution images akin to Midjourney outputs. This model was trained on over 10 million images with enhanced techniques like data augmentation, multiple losses, and multi-resolution methods, achieving superior performance compared to other open-source Canny models.

Architecture

  • Developed by: xinsir
  • Model Type: ControlNet_SDXL
  • Finetuned from: stabilityai/stable-diffusion-xl-base-1.0
  • License: Apache-2.0

The model leverages the ControlNet architecture for text-to-image generation, supporting intricate image designs and high-fidelity outputs.

Training

The model underwent a single-stage training process at a resolution of 1024x1024. The training utilized over 64 A100 GPUs with a real batch size of 2560. The data used was a mix from various sources, including Midjourney and Laion 5B, carefully filtered and annotated. Techniques like random masking and threshold settings were employed to enhance the model's understanding of semantic relationships between prompts and image lines.

Guide: Running Locally

Steps

  1. Install Necessary Libraries: Ensure you have installed packages like diffusers, PIL, and torch.
  2. Load the Model:
    from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
    import torch
    
  3. Prepare the Image: Resize your input image to 1024x1024 for optimal performance.
  4. Run Inference:
    images = pipe(
        prompt="your detailed prompt",
        negative_prompt="negative attributes",
        image=controlnet_img,
        controlnet_conditioning_scale=1.0,
        width=new_width,
        height=new_height,
        num_inference_steps=30,
    ).images
    
  5. Save the Output: Save the generated image in a preferred format, such as PNG for better quality.

Cloud GPUs

For efficient running, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure to handle the model's computational demands.

License

The model is released under the Apache-2.0 license, allowing for flexibility in usage and distribution.

More Related APIs in Text To Image