svdq int4 flux.1 dev LLM Model

Introduction

The SVDQ-INT4-FLUX.1-DEV model, developed by MIT-HAN-LAB and collaborators, is a text-to-image model that utilizes a post-training quantization technique called SVDQuant. This approach allows for 4-bit weights and activations while maintaining high visual fidelity. The model achieves significant memory reduction and speed improvements compared to higher-bit models, particularly when run on specific NVIDIA GPUs.

Architecture

SVDQuant uses a three-stage process to manage outliers in activation and weight data, making 4-bit quantization feasible. The process involves shifting outliers and decomposing weights into low-rank components using Singular Value Decomposition (SVD). The Nunchaku inference engine optimizes latency by fusing kernels to reduce data movement overhead.

Training

The model is an INT W4A4 type, with a size of 6.64GB and requires input resolutions to be a multiple of 65,536 pixels. It was developed by a collaboration of institutions including MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU, and Pika Labs.

Guide: Running Locally

Set Up Environment: Follow the setup instructions in the Nunchaku repository.

Run Model:

import torch
from nunchaku.pipelines import flux as nunchaku_flux

pipeline = nunchaku_flux.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
    qmodel_path="mit-han-lab/svdq-int4-flux.1-dev",
).to("cuda")
image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
image.save("example.png")

Hardware Requirements: The model requires NVIDIA GPUs with architectures sm_86 (RTX 3090, A6000), sm_89 (RTX 4090), or sm_80 (A100).

Suggested Cloud GPUs

Consider using cloud services that offer NVIDIA RTX 4090 or A100 GPUs for optimal performance.

License

The model is released under the flux-1-dev-non-commercial-license, which permits non-commercial use only. For more detailed licensing information, consult the model's documentation.

More Related APIs in Text To Image