svdq int4 flux.1 dev
mit-han-labIntroduction
The SVDQ-INT4-FLUX.1-DEV model, developed by MIT-HAN-LAB and collaborators, is a text-to-image model that utilizes a post-training quantization technique called SVDQuant. This approach allows for 4-bit weights and activations while maintaining high visual fidelity. The model achieves significant memory reduction and speed improvements compared to higher-bit models, particularly when run on specific NVIDIA GPUs.
Architecture
SVDQuant uses a three-stage process to manage outliers in activation and weight data, making 4-bit quantization feasible. The process involves shifting outliers and decomposing weights into low-rank components using Singular Value Decomposition (SVD). The Nunchaku inference engine optimizes latency by fusing kernels to reduce data movement overhead.
Training
The model is an INT W4A4 type, with a size of 6.64GB and requires input resolutions to be a multiple of 65,536 pixels. It was developed by a collaboration of institutions including MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU, and Pika Labs.
Guide: Running Locally
- Set Up Environment: Follow the setup instructions in the Nunchaku repository.
- Run Model:
import torch from nunchaku.pipelines import flux as nunchaku_flux pipeline = nunchaku_flux.from_pretrained( "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, qmodel_path="mit-han-lab/svdq-int4-flux.1-dev", ).to("cuda") image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0] image.save("example.png")
- Hardware Requirements: The model requires NVIDIA GPUs with architectures sm_86 (RTX 3090, A6000), sm_89 (RTX 4090), or sm_80 (A100).
Suggested Cloud GPUs
Consider using cloud services that offer NVIDIA RTX 4090 or A100 GPUs for optimal performance.
License
The model is released under the flux-1-dev-non-commercial-license
, which permits non-commercial use only. For more detailed licensing information, consult the model's documentation.