svdq int4 flux.1 dev

mit-han-lab

Introduction

The SVDQ-INT4-FLUX.1-DEV model, developed by MIT-HAN-LAB and collaborators, is a text-to-image model that utilizes a post-training quantization technique called SVDQuant. This approach allows for 4-bit weights and activations while maintaining high visual fidelity. The model achieves significant memory reduction and speed improvements compared to higher-bit models, particularly when run on specific NVIDIA GPUs.

Architecture

SVDQuant uses a three-stage process to manage outliers in activation and weight data, making 4-bit quantization feasible. The process involves shifting outliers and decomposing weights into low-rank components using Singular Value Decomposition (SVD). The Nunchaku inference engine optimizes latency by fusing kernels to reduce data movement overhead.

Training

The model is an INT W4A4 type, with a size of 6.64GB and requires input resolutions to be a multiple of 65,536 pixels. It was developed by a collaboration of institutions including MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU, and Pika Labs.

Guide: Running Locally

  1. Set Up Environment: Follow the setup instructions in the Nunchaku repository.
  2. Run Model:
    import torch
    from nunchaku.pipelines import flux as nunchaku_flux
    
    pipeline = nunchaku_flux.from_pretrained(
        "black-forest-labs/FLUX.1-dev",
        torch_dtype=torch.bfloat16,
        qmodel_path="mit-han-lab/svdq-int4-flux.1-dev",
    ).to("cuda")
    image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
    image.save("example.png")
    
  3. Hardware Requirements: The model requires NVIDIA GPUs with architectures sm_86 (RTX 3090, A6000), sm_89 (RTX 4090), or sm_80 (A100).

Suggested Cloud GPUs

Consider using cloud services that offer NVIDIA RTX 4090 or A100 GPUs for optimal performance.

License

The model is released under the flux-1-dev-non-commercial-license, which permits non-commercial use only. For more detailed licensing information, consult the model's documentation.

More Related APIs in Text To Image