ldm celebahq 256

CompVis

Introduction

The Latent Diffusion Models (LDM) project focuses on high-resolution image synthesis using a unique approach of diffusion models. By leveraging pretrained autoencoders, LDMs offer a balance between complexity reduction and detail preservation, enhancing visual fidelity. This technique enables the training of diffusion models with limited computational resources, reducing the computational demands typically associated with pixel-based diffusion models.

Architecture

LDMs operate by integrating cross-attention layers into their architecture, allowing for flexibility and power in generating images from a variety of conditioning inputs such as text or bounding boxes. The model is designed to perform high-resolution synthesis in a convolutional manner, achieving state-of-the-art results in image inpainting, unconditional image generation, semantic scene synthesis, and super-resolution tasks.

Training

The training of LDMs is optimized by applying diffusion models in the latent space of pretrained autoencoders. This approach retains quality and flexibility while significantly cutting down on computational resources compared to traditional pixel space diffusion models.

Guide: Running Locally

Basic Steps

  1. Install Dependencies:

    !pip install diffusers
    
  2. Inference with a Pipeline:

    from diffusers import DiffusionPipeline
    
    model_id = "CompVis/ldm-celebahq-256"
    pipeline = DiffusionPipeline.from_pretrained(model_id)
    image = pipeline(num_inference_steps=200)["sample"]
    image[0].save("ldm_generated_image.png")
    
  3. Inference with an Unrolled Loop:

    from diffusers import UNet2DModel, DDIMScheduler, VQModel
    import torch
    import PIL.Image
    import numpy as np
    import tqdm
    
    seed = 3
    unet = UNet2DModel.from_pretrained("CompVis/ldm-celebahq-256", subfolder="unet")
    vqvae = VQModel.from_pretrained("CompVis/ldm-celebahq-256", subfolder="vqvae")
    scheduler = DDIMScheduler.from_config("CompVis/ldm-celebahq-256", subfolder="scheduler")
    
    torch_device = "cuda" if torch.cuda.is_available() else "cpu"
    unet.to(torch_device)
    vqvae.to(torch_device)
    
    generator = torch.manual_seed(seed)
    noise = torch.randn(
        (1, unet.in_channels, unet.sample_size, unet.sample_size),
        generator=generator,
    ).to(torch_device)
    
    scheduler.set_timesteps(num_inference_steps=200)
    image = noise
    for t in tqdm.tqdm(scheduler.timesteps):
        with torch.no_grad():
            residual = unet(image, t)["sample"]
        prev_image = scheduler.step(residual, t, image, eta=0.0)["prev_sample"]
        image = prev_image
    
    with torch.no_grad():
        image = vqvae.decode(image)
    
    image_processed = image.cpu().permute(0, 2, 3, 1)
    image_processed = (image_processed + 1.0) * 127.5
    image_processed = image_processed.clamp(0, 255).numpy().astype(np.uint8)
    image_pil = PIL.Image.fromarray(image_processed[0])
    image_pil.save(f"generated_image_{seed}.png")
    

Cloud GPUs

For improved performance, it is recommended to run the model on cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

This project is licensed under the Apache 2.0 License.

More Related APIs in Unconditional Image Generation