trinart_stable_diffusion_v2
naclbitIntroduction
The TrinArt Stable Diffusion V2 model is an enhanced version of the original Trin-sama Twitter bot model, focusing on anime and manga aesthetics. This model aims to maintain the original stable diffusion style while introducing improvements in image generation.
Architecture
The model is built upon the Stable Diffusion framework and has been adapted to work with the Diffusers library. It supports both text-to-image and image-to-image generation, leveraging different training checkpoints for varied stylistic outputs. The V2 version incorporates additional images, dropouts, and a refined tagging approach to enhance results.
Training
Training involved finetuning with approximately 40,000 high-resolution anime/manga images over eight epochs. The model utilizes 8x NVIDIA A100 40GB GPUs and a custom dataset loader with augmentations like XFlip, center crop, and scaling. It has been trained with learning rate adjustments and 10% dropouts.
Guide: Running Locally
-
Install Dependencies:
pip install diffusers==0.3.0
-
Load the Model:
- For text-to-image:
from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("naclbit/trinart_stable_diffusion_v2", revision="diffusers-60k") pipe.to("cuda")
- For image-to-image, follow similar steps using
StableDiffusionImg2ImgPipeline
.
- For text-to-image:
-
Run Inference:
- Execute the desired pipeline with a prompt and optionally an initial image for image-to-image tasks.
-
Optimization:
- Refer to the optimization documentation for performance improvements.
Cloud GPUs: Consider using cloud services offering NVIDIA A100 GPUs for optimal performance.
License
The model is distributed under the CreativeML OpenRAIL-M license.