ldm celebahq 256
CompVisIntroduction
The Latent Diffusion Models (LDM) project focuses on high-resolution image synthesis using a unique approach of diffusion models. By leveraging pretrained autoencoders, LDMs offer a balance between complexity reduction and detail preservation, enhancing visual fidelity. This technique enables the training of diffusion models with limited computational resources, reducing the computational demands typically associated with pixel-based diffusion models.
Architecture
LDMs operate by integrating cross-attention layers into their architecture, allowing for flexibility and power in generating images from a variety of conditioning inputs such as text or bounding boxes. The model is designed to perform high-resolution synthesis in a convolutional manner, achieving state-of-the-art results in image inpainting, unconditional image generation, semantic scene synthesis, and super-resolution tasks.
Training
The training of LDMs is optimized by applying diffusion models in the latent space of pretrained autoencoders. This approach retains quality and flexibility while significantly cutting down on computational resources compared to traditional pixel space diffusion models.
Guide: Running Locally
Basic Steps
-
Install Dependencies:
!pip install diffusers
-
Inference with a Pipeline:
from diffusers import DiffusionPipeline model_id = "CompVis/ldm-celebahq-256" pipeline = DiffusionPipeline.from_pretrained(model_id) image = pipeline(num_inference_steps=200)["sample"] image[0].save("ldm_generated_image.png")
-
Inference with an Unrolled Loop:
from diffusers import UNet2DModel, DDIMScheduler, VQModel import torch import PIL.Image import numpy as np import tqdm seed = 3 unet = UNet2DModel.from_pretrained("CompVis/ldm-celebahq-256", subfolder="unet") vqvae = VQModel.from_pretrained("CompVis/ldm-celebahq-256", subfolder="vqvae") scheduler = DDIMScheduler.from_config("CompVis/ldm-celebahq-256", subfolder="scheduler") torch_device = "cuda" if torch.cuda.is_available() else "cpu" unet.to(torch_device) vqvae.to(torch_device) generator = torch.manual_seed(seed) noise = torch.randn( (1, unet.in_channels, unet.sample_size, unet.sample_size), generator=generator, ).to(torch_device) scheduler.set_timesteps(num_inference_steps=200) image = noise for t in tqdm.tqdm(scheduler.timesteps): with torch.no_grad(): residual = unet(image, t)["sample"] prev_image = scheduler.step(residual, t, image, eta=0.0)["prev_sample"] image = prev_image with torch.no_grad(): image = vqvae.decode(image) image_processed = image.cpu().permute(0, 2, 3, 1) image_processed = (image_processed + 1.0) * 127.5 image_processed = image_processed.clamp(0, 255).numpy().astype(np.uint8) image_pil = PIL.Image.fromarray(image_processed[0]) image_pil.save(f"generated_image_{seed}.png")
Cloud GPUs
For improved performance, it is recommended to run the model on cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
This project is licensed under the Apache 2.0 License.