stable diffusion x4 upscaler

stabilityai

Introduction

The Stable Diffusion X4 Upscaler is a model designed to enhance image resolution using a text-guided latent upscaling diffusion technique. It was developed by Stability AI, primarily utilizing the LAION dataset, and is accessible through Hugging Face's platform.

Architecture

The model is a diffusion-based text-to-image generation system that employs a fixed, pretrained text encoder (OpenCLIP-ViT/H). It operates by encoding images into latent representations using an autoencoder and then decoding them with a U-Net backbone guided by textual inputs. The system also incorporates a noise level parameter to manage the diffusion process.

Training

The training involved a subset of the LAION-5B dataset, focusing on images larger than 2048x2048. The model was trained over 1.25 million steps on image crops of 512x512 size, using a noise-level input for diffusion control. Training was conducted on 32 A100 GPUs using the AdamW optimizer, with a batch size of 2048 and a learning rate that was warmed up and then held constant.

Guide: Running Locally

To run the Stable Diffusion X4 Upscaler locally:

  1. Install Required Packages:

    pip install diffusers transformers accelerate scipy safetensors
    
  2. Load the Model and Scheduler:

    from diffusers import StableDiffusionUpscalePipeline
    import torch
    
    model_id = "stabilityai/stable-diffusion-x4-upscaler"
    pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    pipeline = pipeline.to("cuda")
    
  3. Download and Process an Image:

    import requests
    from PIL import Image
    from io import BytesIO
    
    url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
    response = requests.get(url)
    low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
    low_res_img = low_res_img.resize((128, 128))
    
  4. Generate Upscaled Image:

    prompt = "a white cat"
    upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
    upscaled_image.save("upsampled_cat.png")
    

For optimal performance, especially in terms of memory efficiency, consider using cloud GPUs such as those provided by AWS or Google Cloud.

License

The Stable Diffusion X4 Upscaler is distributed under the CreativeML Open RAIL++-M License, which regulates the model's use, particularly in relation to generating potentially harmful content.

More Related APIs