nyaflow xl alpha
nyanko7Introduction
NYAFLOW-XL [ALPHA] is an experimental project aimed at fine-tuning the Stable Diffusion XL model using the Flow Matching training objective. This effort, completed in July 2024, leverages several publicly available datasets to enhance model performance.
Architecture
Flow Matching is a technique that generates samples from a target data distribution by iteratively modifying samples from a prior distribution, such as Gaussian. The model learns to predict the velocity (V_t = \frac{dX_t}{dt}), guiding the sample (X_t) towards (X_1). The process employs a logit-normal distribution for sampling and uses an optimal transport path for constructing (X_t), based on previous research by Esser et al., 2024.
Training
The training dataset comprises 3.6 million recaptioned/tagged image-text pairs, processed for enhanced context and stability. Training was executed on a 32×H100 GPU cluster using the deepspeed framework, facilitated by a compute grant. Despite time constraints limiting training to approximately 48 hours, the model showed consistent improvements in validation loss and evaluation performance. The NYAFLOW-XL model supports various concepts, styles, and character rendering, although it may struggle with natural language inputs and may produce oversaturated images due to overfitting.
Guide: Running Locally
-
Clone the Repository:
Clone the NYAFLOW-XL repository from Hugging Face. -
Set Up Environment:
Prepare your environment with necessary dependencies, preferably using a virtual environment. -
Download Model Weights:
Obtain model weights and datasets, ensuring they are placed in the appropriate directories. -
Run Inference:
Use the provided scripts to execute the model on test data. -
Utilize Cloud GPUs:
For optimal performance, consider using cloud services with GPU support, such as AWS, GCP, or Azure.
License
The NYAFLOW-XL model is released under the MIT License, allowing for broad use and modification.