Hamachi_ S D3.5 M_v008 A

FA770

Introduction

The Hamachi_SD3.5M_v008A is an experimental anime model developed to explore training methodologies for the flow-matching DiT model, SD3.5 Medium. It's currently in the training phase, leading to inconsistent art styles and issues with body structures.

Architecture

The model utilizes a flow-matching DiT architecture with a focus on anime-style image generation. It incorporates both DiT and CLIP models within the "CLIP_INCL" file, while the standard T5XXL is used without modifications. The inference model shift should be set to around 7 to avoid overburned images.

Training

Observations and Reflections

  • Curriculum Learning: Adopted a multi-stage training process, starting with low-frequency elements for stability, followed by high-frequency details.
  • Resolution Increase: Began with lower resolutions to grasp basic concepts, then moved to higher resolutions (512px to 1024px).
  • T5 Attention Mask: Training without the mask yielded more natural results.
  • Weighting Scheme: Found logit_normal or mode more effective for large-scale training.
  • Training Shift: Optimal shift values vary with resolution, with a recommended shift of around 6 for inference.
  • Optimizers: Effective optimizers included AdamW, ScheduleFreeAdamW, and Cautious ADOPT.
  • Batch Size: Lower batch sizes (4 or 8) provided more stable training.
  • Local Minima: Increasing the learning rate helped resolve dark or black image outputs.

Dataset Preparation

Utilized Hakubooru-based custom scripts, excluding specific tags and using a minimum post ID of 1,000,000.

Training Details

  • Hardware: Single RTX 4090
  • Method: Full Fine-Tune
  • Scripts: Used sd-scripts and pytorch_optimizer
  • Settings: Various learning rates, batch sizes, and optimizers across different training phases.

Guide: Running Locally

  1. Setup Environment: Install necessary dependencies and set up the environment using sd-scripts and pytorch_optimizer.
  2. Model Configuration: Use the included "CLIP_INCL" file and set the model shift to around 7 for inference.
  3. Execution: Run the training script using specified parameters for batch size, learning rate, and other settings.
  4. GPU Recommendation: Utilize cloud GPUs like Google Colab or AWS for efficient training and inference.

License

The Hamachi_SD3.5M_v008A model is licensed under the stabilityai-ai-community license. For detailed terms, refer to the license link.

More Related APIs in Text To Image