stable cascade
stabilityaiIntroduction
Stable Cascade is a diffusion model developed by Stability AI for generating images from text prompts. It leverages a highly compressed latent space to enhance efficiency and reduce computational costs. This makes it suitable for applications prioritizing speed and resource utilization.
Architecture
Stable Cascade is based on the Würstchen architecture and features a much smaller latent space compared to models like Stable Diffusion. It achieves a compression factor of 42, encoding a 1024x1024 image to 24x24, while maintaining image quality. The model consists of three stages: Stage A, Stage B, and Stage C. Stage A and B focus on compressing images, while Stage C generates small latents from text prompts. The model is available in different parameter sizes, with larger versions providing better results.
Training
The text-conditional model is trained in the compressed latent space, achieving a cost reduction compared to previous models like Stable Diffusion 1.5. The training leverages fine-tuning techniques for enhanced performance, especially in larger parameter versions.
Guide: Running Locally
-
Install Prerequisites
Ensure PyTorch 2.2.0 or higher is installed to usetorch.bfloat16
. Install thediffusers
library:pip install diffusers
-
Load the Model
Use the following code to load and run the model:import torch from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline prompt = "an image of a shiba inu, donning a spacesuit and helmet" prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", variant="bf16", torch_dtype=torch.bfloat16) decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.float16) prior.enable_model_cpu_offload() prior_output = prior(prompt=prompt, height=1024, width=1024, guidance_scale=4.0, num_inference_steps=20) decoder.enable_model_cpu_offload() decoder_output = decoder(image_embeddings=prior_output.image_embeddings.to(torch.float16), prompt=prompt, guidance_scale=0.0, num_inference_steps=10).images[0] decoder_output.save("cascade.png")
-
Optimization Tips
- Use cloud GPUs like AWS EC2 P3 or Google Cloud's GPU instances for efficient processing.
- Enable model CPU offload to manage memory usage effectively.
License
The Stable Cascade model is released under the stable-cascade-nc-community license. For details, refer to the LICENSE file.