stable cascade LLM Model — Open LLM List

Introduction

Stable Cascade is a diffusion model developed by Stability AI for generating images from text prompts. It leverages a highly compressed latent space to enhance efficiency and reduce computational costs. This makes it suitable for applications prioritizing speed and resource utilization.

Architecture

Stable Cascade is based on the Würstchen architecture and features a much smaller latent space compared to models like Stable Diffusion. It achieves a compression factor of 42, encoding a 1024x1024 image to 24x24, while maintaining image quality. The model consists of three stages: Stage A, Stage B, and Stage C. Stage A and B focus on compressing images, while Stage C generates small latents from text prompts. The model is available in different parameter sizes, with larger versions providing better results.

Training

The text-conditional model is trained in the compressed latent space, achieving a cost reduction compared to previous models like Stable Diffusion 1.5. The training leverages fine-tuning techniques for enhanced performance, especially in larger parameter versions.

Guide: Running Locally

Install Prerequisites
Ensure PyTorch 2.2.0 or higher is installed to use torch.bfloat16. Install the diffusers library:
```
pip install diffusers
```

Load the Model
Use the following code to load and run the model:

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "an image of a shiba inu, donning a spacesuit and helmet"
prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", variant="bf16", torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.float16)

prior.enable_model_cpu_offload()
prior_output = prior(prompt=prompt, height=1024, width=1024, guidance_scale=4.0, num_inference_steps=20)

decoder.enable_model_cpu_offload()
decoder_output = decoder(image_embeddings=prior_output.image_embeddings.to(torch.float16), prompt=prompt, guidance_scale=0.0, num_inference_steps=10).images[0]
decoder_output.save("cascade.png")

Optimization Tips
- Use cloud GPUs like AWS EC2 P3 or Google Cloud's GPU instances for efficient processing.
- Enable model CPU offload to manage memory usage effectively.

License

The Stable Cascade model is released under the stable-cascade-nc-community license. For details, refer to the LICENSE file.

More Related APIs in Text To Image