Hamachi_ S D3.5 M_v008 A
FA770Introduction
The Hamachi_SD3.5M_v008A is an experimental anime model developed to explore training methodologies for the flow-matching DiT model, SD3.5 Medium. It's currently in the training phase, leading to inconsistent art styles and issues with body structures.
Architecture
The model utilizes a flow-matching DiT architecture with a focus on anime-style image generation. It incorporates both DiT and CLIP models within the "CLIP_INCL" file, while the standard T5XXL is used without modifications. The inference model shift should be set to around 7 to avoid overburned images.
Training
Observations and Reflections
- Curriculum Learning: Adopted a multi-stage training process, starting with low-frequency elements for stability, followed by high-frequency details.
- Resolution Increase: Began with lower resolutions to grasp basic concepts, then moved to higher resolutions (512px to 1024px).
- T5 Attention Mask: Training without the mask yielded more natural results.
- Weighting Scheme: Found logit_normal or mode more effective for large-scale training.
- Training Shift: Optimal shift values vary with resolution, with a recommended shift of around 6 for inference.
- Optimizers: Effective optimizers included AdamW, ScheduleFreeAdamW, and Cautious ADOPT.
- Batch Size: Lower batch sizes (4 or 8) provided more stable training.
- Local Minima: Increasing the learning rate helped resolve dark or black image outputs.
Dataset Preparation
Utilized Hakubooru-based custom scripts, excluding specific tags and using a minimum post ID of 1,000,000.
Training Details
- Hardware: Single RTX 4090
- Method: Full Fine-Tune
- Scripts: Used sd-scripts and pytorch_optimizer
- Settings: Various learning rates, batch sizes, and optimizers across different training phases.
Guide: Running Locally
- Setup Environment: Install necessary dependencies and set up the environment using
sd-scripts
andpytorch_optimizer
. - Model Configuration: Use the included "CLIP_INCL" file and set the model shift to around 7 for inference.
- Execution: Run the training script using specified parameters for batch size, learning rate, and other settings.
- GPU Recommendation: Utilize cloud GPUs like Google Colab or AWS for efficient training and inference.
License
The Hamachi_SD3.5M_v008A model is licensed under the stabilityai-ai-community license. For detailed terms, refer to the license link.