controlnet canny sdxl 1.0
diffusersIntroduction
The ControlNet-Canny-SDXL-1.0 model utilizes ControlNet weights trained on the stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. It is designed for text-to-image generation, offering various prompts to produce realistic and detailed images.
Architecture
The model is based on the Stable Diffusion XL architecture and leverages ControlNet for enhanced image generation capabilities. It uses the canny edge detection algorithm to provide conditioning to the ControlNet model, improving control over the generated content.
Training
The model was trained using a two-phase approach:
- Phase 1: 20,000 steps on laion 6a dataset resized to a maximum dimension of 384.
- Phase 2: Additional 20,000 steps on the same dataset resized to a maximum dimension of 1024, filtered to contain only images with a minimum size of 1024 for enhanced quality.
- Compute: Training was performed using one 8xA100 machine.
- Batch Size: Data parallelism with a single GPU batch size of 8, resulting in a total batch size of 64.
- Hyperparameters: Constant learning rate of 1e-4, scaled by batch size to 64e-4.
- Precision: Mixed precision with fp16 was employed.
Guide: Running Locally
-
Install Required Libraries:
pip install accelerate transformers safetensors opencv-python diffusers
-
Load and Run the Model:
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL from diffusers.utils import load_image from PIL import Image import torch import numpy as np import cv2 prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting" negative_prompt = 'low quality, bad quality, sketches' image = load_image("your_image_url_here.png") controlnet_conditioning_scale = 0.5 controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16) vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16) pipe = StableDiffusionXLControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, torch_dtype=torch.float16) pipe.enable_model_cpu_offload() image = np.array(image) image = cv2.Canny(image, 100, 200) image = image[:, :, None] image = np.concatenate([image, image, image], axis=2) image = Image.fromarray(image) images = pipe(prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale).images images[0].save("output_image.png")
-
Suggested Cloud GPU: For optimal performance, consider using cloud services like AWS, Google Cloud, or Azure with GPU instances such as NVIDIA A100.
License
The model is released under the OpenRAIL++ license, which allows for wide use in research and commercial applications with certain restrictions and obligations.