Introduction

ProteusV0.2 is an advanced text-to-image model designed to enhance the capabilities of OpenDalleV1.1. It features improved understanding of prompts and superior stylistic capabilities, making it suitable for creating high-quality, detailed images across various styles, including anime and cartoon visualizations.

Architecture

ProteusV0.2 builds upon OpenDalleV1.1 by integrating new methods and fine-tuning with a large dataset of 220,000 captioned images. It utilizes Direct Preference Optimization (DPO) with 10,000 high-quality AI-generated image pairs to enhance performance. The model incorporates several Low-Rank Adaptation (LORA) models, which are trained independently and integrated selectively, focusing on specific model segments to improve image quality without disrupting other functionalities.

Training

The training of ProteusV0.2 involves using a large dataset of captioned images, including both stock images and anime styles, to fine-tune the model for better prompt responsiveness and creativity. DPO is used alongside a curated set of high-quality AI-generated image pairs to refine the model's output. LORA models are dynamically applied during training to target specific areas, enhancing facial characteristics and skin texture portrayal.

Guide: Running Locally

To run ProteusV0.2 locally with the ๐Ÿงจ Diffusers library, follow these steps:

  1. Set up the environment: Ensure you have Python and PyTorch installed.
  2. Install necessary libraries: Use pip to install the diffusers library.
  3. Load the VAE component:
    from diffusers import AutoencoderKL
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix", 
        torch_dtype=torch.float16
    )
    
  4. Configure the pipeline:
    from diffusers import StableDiffusionXLPipeline, KDPM2AncestralDiscreteScheduler
    pipe = StableDiffusionXLPipeline.from_pretrained(
        "dataautogpt3/ProteusV0.2", 
        vae=vae,
        torch_dtype=torch.float16
    )
    pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config)
    pipe.to('cuda')
    
  5. Generate an image:
    prompt = "black fluffy gorgeous dangerous cat animal creature, large orange eyes, big fluffy ears, piercing gaze, full moon, dark ambiance, best quality, extremely detailed"
    negative_prompt = "nsfw, bad quality, bad anatomy, worst quality, low quality, low resolutions, extra fingers, blur, blurry, ugly, wrongs proportions, watermark, image artifacts, lowres, ugly, jpeg artifacts, deformed, noisy image"
    
    image = pipe(
        prompt, 
        negative_prompt=negative_prompt, 
        width=1024,
        height=1024,
        guidance_scale=7.5,
        num_inference_steps=50
    ).images[0]
    

For optimal performance, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure.

License

ProteusV0.2 is released under the GPL-3.0 license, allowing for free use and distribution under the same license terms.

More Related APIs in Text To Image