Dungeons and Diffusion

0xJustin

Introduction

Dungeons-and-Diffusion is a text-to-image generation model fine-tuned from Protogen, designed to produce detailed images suitable for fantasy themes such as Dungeons & Dragons characters. The model excels at generating images at resolutions higher than 512x512. It supports a variety of species and classes, making it a versatile tool for creating diverse character artwork.

Architecture

The model leverages the StableDiffusionPipeline and is compatible with Safetensors. It is trained to emulate the style of commissioned DnD character art, with specific attention to different fantasy creatures and classes.

Training

The training involves two significant models:

  • Model16000: Trained with D&D character prompts, showing better results with specific races like centaurs and aarakocra.
  • Model30000: Trained on a broader set of images, capturing the essence of DnD character commissions. It effectively generates images for most races, although some challenges remain in differentiating features like elf ears and horns.

Guide: Running Locally

  1. Installation: Ensure that you have the necessary dependencies installed, including Python and PyTorch. Clone the repository to your local machine.

  2. Model Weights: Download the latest model weights, D&Diffusion3.0_Protogen.ckpt, from the Hugging Face model card page.

  3. Running Inference:

    • Load the model using the StableDiffusionPipeline.
    • Configure your prompt settings, including positive and negative prompts, CFG scale, and other parameters as demonstrated in the example prompts.
  4. Hardware Requirements: For optimal performance, using a cloud GPU service like AWS EC2 with NVIDIA GPUs, Google Cloud, or Azure is recommended.

License

The model is licensed under the CreativeML OpenRAIL-M license, which allows for creative and non-commercial use while ensuring credit to the original creator.

More Related APIs in Text To Image