Arcane Diffusion

nitrosocke

Introduction

Arcane Diffusion is a fine-tuned model of Stable Diffusion, specifically trained on images from the TV show Arcane. It allows users to generate images in the "arcane style" using text prompts. The model can be used via various platforms, including Gradio and Colab, and is available for different optimizations like ONNX, MPS, and FLAX/JAX.

Architecture

Arcane Diffusion is based on the Stable Diffusion architecture. It utilizes Diffusers, which is a library that provides efficient and versatile pipelines for diffusion models. The model is designed to work similarly to other Stable Diffusion models, allowing for easy integration and use.

Training

Arcane Diffusion has gone through multiple training versions:

  • Version 3: Trained with the new train-text-encoder setting, using 95 images from the show over 8000 steps. This version significantly enhances quality and editability.
  • Version 2: Utilized DreamBooth training with diffusers and prior-preservation loss over 5000 steps, showing a need for more steps for prominent results.
  • Version 1: Employed Unfrozen Model Textual Inversion and prior-preservation loss, with a slight stylistic shift even without using the arcane token.

Guide: Running Locally

To run Arcane Diffusion locally, follow these steps:

  1. Install Required Libraries:

    pip install diffusers transformers scipy torch
    
  2. Load and Run the Model:

    from diffusers import StableDiffusionPipeline
    import torch
    
    model_id = "nitrosocke/Arcane-Diffusion"
    pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    pipe = pipe.to("cuda")
    
    prompt = "arcane style, a magical princess with golden hair"
    image = pipe(prompt).images[0]
    image.save("./magical_princess.png")
    
  3. Cloud GPU Suggestion: For optimal performance, especially when dealing with large models or multiple iterations, consider using cloud GPUs such as those provided by Google Colab or AWS.

License

Arcane Diffusion is licensed under the CreativeML OpenRAIL-M license. This allows for a range of uses while maintaining some restrictions to ensure responsible and ethical use.

More Related APIs in Text To Image