ldm celebahq 256 LLM Model

Introduction

The Latent Diffusion Models (LDM) project focuses on high-resolution image synthesis using a unique approach of diffusion models. By leveraging pretrained autoencoders, LDMs offer a balance between complexity reduction and detail preservation, enhancing visual fidelity. This technique enables the training of diffusion models with limited computational resources, reducing the computational demands typically associated with pixel-based diffusion models.

Architecture

LDMs operate by integrating cross-attention layers into their architecture, allowing for flexibility and power in generating images from a variety of conditioning inputs such as text or bounding boxes. The model is designed to perform high-resolution synthesis in a convolutional manner, achieving state-of-the-art results in image inpainting, unconditional image generation, semantic scene synthesis, and super-resolution tasks.

Training

The training of LDMs is optimized by applying diffusion models in the latent space of pretrained autoencoders. This approach retains quality and flexibility while significantly cutting down on computational resources compared to traditional pixel space diffusion models.

Guide: Running Locally

Basic Steps

Install Dependencies:
```
!pip install diffusers
```

Inference with a Pipeline:

from diffusers import DiffusionPipeline

model_id = "CompVis/ldm-celebahq-256"
pipeline = DiffusionPipeline.from_pretrained(model_id)
image = pipeline(num_inference_steps=200)["sample"]
image[0].save("ldm_generated_image.png")

Inference with an Unrolled Loop:

from diffusers import UNet2DModel, DDIMScheduler, VQModel
import torch
import PIL.Image
import numpy as np
import tqdm

seed = 3
unet = UNet2DModel.from_pretrained("CompVis/ldm-celebahq-256", subfolder="unet")
vqvae = VQModel.from_pretrained("CompVis/ldm-celebahq-256", subfolder="vqvae")
scheduler = DDIMScheduler.from_config("CompVis/ldm-celebahq-256", subfolder="scheduler")

torch_device = "cuda" if torch.cuda.is_available() else "cpu"
unet.to(torch_device)
vqvae.to(torch_device)

generator = torch.manual_seed(seed)
noise = torch.randn(
    (1, unet.in_channels, unet.sample_size, unet.sample_size),
    generator=generator,
).to(torch_device)

scheduler.set_timesteps(num_inference_steps=200)
image = noise
for t in tqdm.tqdm(scheduler.timesteps):
    with torch.no_grad():
        residual = unet(image, t)["sample"]
    prev_image = scheduler.step(residual, t, image, eta=0.0)["prev_sample"]
    image = prev_image

with torch.no_grad():
    image = vqvae.decode(image)

image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.clamp(0, 255).numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
image_pil.save(f"generated_image_{seed}.png")

Cloud GPUs

For improved performance, it is recommended to run the model on cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

This project is licensed under the Apache 2.0 License.

More Related APIs in Unconditional Image Generation