control_v11p_sd15_lineart

lllyasviel

Introduction

ControlNet is a neural network architecture designed to add conditional control to large, pretrained diffusion models. It was introduced in the paper "Adding Conditional Control to Text-to-Image Diffusion Models" by Lvmin Zhang and Maneesh Agrawala. ControlNet enables models to support additional input conditions by learning task-specific conditions in an end-to-end manner, which is effective even with smaller datasets. This architecture allows models like Stable Diffusion to incorporate conditional inputs such as edge maps and segmentation maps, thereby enhancing control over the diffusion processes.

Architecture

ControlNet is integrated with pretrained diffusion models like Stable Diffusion to provide conditional input support. The architecture is designed to be robust and scalable; it is efficient enough to be trained on personal devices but can also handle large datasets when powerful computational resources are available. The model structure allows for quick training, similar to fine-tuning a diffusion model.

Training

ControlNet can be trained with task-specific conditions using relatively small datasets, making it robust in scenarios with limited data. The training process is efficient, allowing for quick adaptation and deployment on various devices. The model can scale efficiently with larger datasets when the necessary computational resources are available.

Guide: Running Locally

To run the ControlNet model locally, follow these steps:

  1. Install Dependencies: Install the required auxiliary and diffusers packages.

    pip install controlnet_aux==0.3.0 diffusers transformers accelerate
    
  2. Set Up the Model: Download and set up the ControlNet model.

    import torch
    from diffusers.utils import load_image
    from controlnet_aux import LineartDetector
    from diffusers import ControlNetModel, StableDiffusionControlNetPipeline, UniPCMultistepScheduler
    
    checkpoint = "ControlNet-1-1-preview/control_v11p_sd15_lineart"
    image = load_image("https://huggingface.co/ControlNet-1-1-preview/control_v11p_sd15_lineart/resolve/main/images/input.png")
    image = image.resize((512, 512))
    
    prompt = "michael jackson concert"
    processor = LineartDetector.from_pretrained("lllyasviel/Annotators")
    
    control_image = processor(image)
    control_image.save("./images/control.png")
    
    controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
    pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
    
    pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
    pipe.enable_model_cpu_offload()
    
    generator = torch.manual_seed(0)
    image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]
    
    image.save('images/image_out.png')
    
  3. Cloud GPUs: For resource-intensive tasks, consider using cloud GPU services to enhance performance and speed up processing.

License

The model is licensed under the CreativeML OpenRAIL-M license. This license is part of the responsible AI licensing efforts by BigScience and the RAIL Initiative, adapted to support ethical AI development. More details on the licensing can be found here.

More Related APIs in Image To Image