Introduction

The Control-LoRA model integrates low-rank parameter efficient fine-tuning into ControlNet, enhancing model control for a broad range of consumer GPUs. The model provides compact and efficient alternatives to the traditional ControlNet models, significantly reducing model size from 4.7GB to approximately 738MB for Rank 256 files and to about 377MB for Rank 128 files. Each Control-LoRA is trained on diverse image concepts and aspect ratios.

Architecture

Control-LoRAs leverage grayscale depth maps and Canny edge detection for guided image generation. The depth estimation, based on MiDaS dpt_beit_large_512, assesses object distance in a scene. This information is refined using the Portrait Depth Estimation model from the ClipDrop API. Canny edge detection highlights image edges by identifying abrupt changes in intensity.

Training

Control-LoRAs are trained with a focus on image colorization and revision. The photograph and sketch colorizers are designed to re-color black-and-white photographs and sketches, respectively. The Revision approach uses pooled CLIP embeddings to generate conceptually similar images to the input and allows for blending multiple image or text concepts as positive or negative prompts.

Guide: Running Locally

  1. Installation: Clone the ComfyUI or StableSwarmUI repositories from GitHub.
  2. Workflow Setup: Use basic ComfyUI workflows available in the Hugging Face repository. Custom nodes from Stability can be integrated.
  3. Run the Model: Execute the model using the installed UI on your local machine.
  4. Cloud GPUs: For enhanced performance, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure.

License

The Control-LoRA model operates under an "other" license, indicating specific licensing terms that may differ from standard open-source licenses. Users should review the license details provided with the model.

More Related APIs in Text To Image