trinart_stable_diffusion_v2

naclbit

Introduction

The TrinArt Stable Diffusion V2 model is an enhanced version of the original Trin-sama Twitter bot model, focusing on anime and manga aesthetics. This model aims to maintain the original stable diffusion style while introducing improvements in image generation.

Architecture

The model is built upon the Stable Diffusion framework and has been adapted to work with the Diffusers library. It supports both text-to-image and image-to-image generation, leveraging different training checkpoints for varied stylistic outputs. The V2 version incorporates additional images, dropouts, and a refined tagging approach to enhance results.

Training

Training involved finetuning with approximately 40,000 high-resolution anime/manga images over eight epochs. The model utilizes 8x NVIDIA A100 40GB GPUs and a custom dataset loader with augmentations like XFlip, center crop, and scaling. It has been trained with learning rate adjustments and 10% dropouts.

Guide: Running Locally

  1. Install Dependencies:

    pip install diffusers==0.3.0
    
  2. Load the Model:

    • For text-to-image:
      from diffusers import StableDiffusionPipeline
      pipe = StableDiffusionPipeline.from_pretrained("naclbit/trinart_stable_diffusion_v2", revision="diffusers-60k")
      pipe.to("cuda")
      
    • For image-to-image, follow similar steps using StableDiffusionImg2ImgPipeline.
  3. Run Inference:

    • Execute the desired pipeline with a prompt and optionally an initial image for image-to-image tasks.
  4. Optimization:

Cloud GPUs: Consider using cloud services offering NVIDIA A100 GPUs for optimal performance.

License

The model is distributed under the CreativeML OpenRAIL-M license.

More Related APIs in Text To Image