controlnet openpose sdxl 1.0
thibaudIntroduction
The controlnet-openpose-sdxl-1.0
model is a set of ControlNet weights trained on stabilityai/stable-diffusion-xl-base-1.0
, with OpenPose (v2) conditioning. It is designed for text-to-image tasks using the diffusers library.
Architecture
The model is built upon the stabilityai/stable-diffusion-xl-base-1.0
architecture, integrating ControlNet for enhanced control over image generation. It employs OpenPose for pose detection, allowing for more precise conditioning in the generation process.
Training
- Training Data: The checkpoint was trained for 15,000 steps using the LAION 6a dataset, resized to a maximum minimum dimension of 768.
- Compute: Training utilized one NVIDIA A100 GPU, provided by Hugging Face.
- Batch Size: Implemented data parallelism with a single GPU batch size of 2 and gradient accumulation set to 8.
- Hyperparameters: Training was conducted with a constant learning rate of 8e-5.
- Mixed Precision: FP16 precision was used during training for efficiency.
Guide: Running Locally
-
Install Required Libraries:
pip install -q controlnet_aux transformers accelerate pip install -q git+https://github.com/huggingface/diffusers
-
Load Pre-trained Models:
from diffusers import AutoencoderKL, StableDiffusionXLControlNetPipeline, ControlNetModel from controlnet_aux import OpenposeDetector import torch openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet") controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16) pipe = StableDiffusionXLControlNetPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16 ) pipe.enable_model_cpu_offload()
-
Generate Images:
prompt = "Darth vader dancing in a desert, high quality" negative_prompt = "low quality, bad quality" images = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=25, num_images_per_prompt=4).images
-
Cloud GPUs: For optimal performance, consider using cloud-based GPUs like AWS EC2 P3 instances or Google Cloud's AI Platform.
License
The model uses an "other" license and refers to the OpenPose license for specific terms. Users should verify compatibility with their use case.