catvton flux alpha
xiaozaaIntroduction
CATVTON-Flux is an advanced virtual try-on model that integrates CATVTON (Contrastive Appearance and Topology Virtual Try-On) with Flux fill inpainting for realistic clothing transfer. It achieves state-of-the-art performance in virtual garment fitting applications, as demonstrated by its results on the VITON-HD dataset.
Architecture
The model utilizes a combination of the CATVTON framework and the Flux inpainting model. It employs a transformer-based architecture for image-to-image tasks, specifically designed for virtual try-on applications. The model is built on the diffusers
library and leverages pre-trained components from "black-forest-labs/FLUX.1-Fill-dev".
Training
Training Data
The model is trained using the VITON-HD dataset, which consists of high-resolution images tailored for virtual try-on applications.
Training Procedure
The training process involves fine-tuning the Flux1-dev-fill model to enhance its performance in virtual try-on scenarios.
Evaluation
The model achieves an FID score of 5.593255043029785, marking it as the state-of-the-art in its category.
Guide: Running Locally
To use the CATVTON-Flux model locally, follow these steps:
- Install the necessary libraries, ensuring you have PyTorch and the
diffusers
library. - Load the pre-trained transformer model:
transformer = FluxTransformer2DModel.from_pretrained( "xiaozaa/catvton-flux-alpha", torch_dtype=torch.bfloat16 )
- Initialize the pipeline:
pipe = FluxFillPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16 ).to("cuda")
- Run the model using input parameters such as person image, person mask, garment image, and an optional random seed.
For optimal performance, it is recommended to use cloud GPUs such as AWS EC2, Google Cloud, or Azure.
License
The model is licensed under the CC BY-NC 2.0, which allows for non-commercial use with appropriate credit.