stable diffusion x4 upscaler
stabilityaiIntroduction
The Stable Diffusion X4 Upscaler is a model designed to enhance image resolution using a text-guided latent upscaling diffusion technique. It was developed by Stability AI, primarily utilizing the LAION dataset, and is accessible through Hugging Face's platform.
Architecture
The model is a diffusion-based text-to-image generation system that employs a fixed, pretrained text encoder (OpenCLIP-ViT/H). It operates by encoding images into latent representations using an autoencoder and then decoding them with a U-Net backbone guided by textual inputs. The system also incorporates a noise level parameter to manage the diffusion process.
Training
The training involved a subset of the LAION-5B dataset, focusing on images larger than 2048x2048. The model was trained over 1.25 million steps on image crops of 512x512 size, using a noise-level input for diffusion control. Training was conducted on 32 A100 GPUs using the AdamW optimizer, with a batch size of 2048 and a learning rate that was warmed up and then held constant.
Guide: Running Locally
To run the Stable Diffusion X4 Upscaler locally:
-
Install Required Packages:
pip install diffusers transformers accelerate scipy safetensors
-
Load the Model and Scheduler:
from diffusers import StableDiffusionUpscalePipeline import torch model_id = "stabilityai/stable-diffusion-x4-upscaler" pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16) pipeline = pipeline.to("cuda")
-
Download and Process an Image:
import requests from PIL import Image from io import BytesIO url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png" response = requests.get(url) low_res_img = Image.open(BytesIO(response.content)).convert("RGB") low_res_img = low_res_img.resize((128, 128))
-
Generate Upscaled Image:
prompt = "a white cat" upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0] upscaled_image.save("upsampled_cat.png")
For optimal performance, especially in terms of memory efficiency, consider using cloud GPUs such as those provided by AWS or Google Cloud.
License
The Stable Diffusion X4 Upscaler is distributed under the CreativeML Open RAIL++-M License, which regulates the model's use, particularly in relation to generating potentially harmful content.