Introduction

Leffa is a framework for controllable person image generation, enabling precise manipulation of a person's appearance (virtual try-on) and pose (pose transfer). It addresses issues in previous methods that often resulted in distortion of fine-grained textural details. Leffa introduces the concept of learning flow fields in attention, guiding the target query to attend to correct reference keys in the attention layer using a regularization loss.

Architecture

The architecture of Leffa involves a diffusion-based baseline combined with a specialized Leffa loss. This loss is applied atop the attention map, ensuring accurate attention to corresponding regions in the reference image, thus enhancing fine-grained detail preservation while maintaining image quality.

Training

Leffa's training involves conditioning on reference images to achieve desired appearance and pose outcomes. The training process incorporates a diffusion model with a regularization loss on the attention map, ensuring precise attention to details. This approach significantly reduces distortion in fine-grained details and is adaptable for improving other diffusion models.

Guide: Running Locally

Basic Steps

  1. Create a Conda Environment:
    conda create -n leffa python==3.10
    conda activate leffa
    
  2. Navigate to the Project Directory:
    cd Leffa
    
  3. Install Requirements:
    pip install -r requirements.txt
    
  4. Run the Gradio App Locally:
    python app.py
    

Cloud GPUs

For enhanced performance, especially for training or large-scale experiments, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

Leffa is released under the MIT License, permitting use, modification, and distribution, with the condition of including the license in all copies or substantial portions of the software.

More Related APIs in Image To Image