controlnet openpose sdxl 1.0

thibaud

Introduction

The controlnet-openpose-sdxl-1.0 model is a set of ControlNet weights trained on stabilityai/stable-diffusion-xl-base-1.0, with OpenPose (v2) conditioning. It is designed for text-to-image tasks using the diffusers library.

Architecture

The model is built upon the stabilityai/stable-diffusion-xl-base-1.0 architecture, integrating ControlNet for enhanced control over image generation. It employs OpenPose for pose detection, allowing for more precise conditioning in the generation process.

Training

  • Training Data: The checkpoint was trained for 15,000 steps using the LAION 6a dataset, resized to a maximum minimum dimension of 768.
  • Compute: Training utilized one NVIDIA A100 GPU, provided by Hugging Face.
  • Batch Size: Implemented data parallelism with a single GPU batch size of 2 and gradient accumulation set to 8.
  • Hyperparameters: Training was conducted with a constant learning rate of 8e-5.
  • Mixed Precision: FP16 precision was used during training for efficiency.

Guide: Running Locally

  1. Install Required Libraries:

    pip install -q controlnet_aux transformers accelerate
    pip install -q git+https://github.com/huggingface/diffusers
    
  2. Load Pre-trained Models:

    from diffusers import AutoencoderKL, StableDiffusionXLControlNetPipeline, ControlNetModel
    from controlnet_aux import OpenposeDetector
    import torch
    
    openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
    controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
    pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
    )
    pipe.enable_model_cpu_offload()
    
  3. Generate Images:

    prompt = "Darth vader dancing in a desert, high quality"
    negative_prompt = "low quality, bad quality"
    images = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=25, num_images_per_prompt=4).images
    
  4. Cloud GPUs: For optimal performance, consider using cloud-based GPUs like AWS EC2 P3 instances or Google Cloud's AI Platform.

License

The model uses an "other" license and refers to the OpenPose license for specific terms. Users should verify compatibility with their use case.

More Related APIs in Text To Image