Taiyi Stable Diffusion 1 B Chinese E N v0.1

IDEA-CCNL

Introduction

Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1 is the first open-source bilingual Chinese and English Stable Diffusion model. It is trained on 20 million filtered Chinese image-text pairs, providing a robust text-to-image generation capability.

Architecture

The model utilizes the Noah-Wukong and Zero datasets, employing a two-stage training process. Initially, the text encoder is trained while other parts of the model are frozen to maintain generative capabilities and align Chinese concepts. In the second stage, the entire model is unfrozen to train both the text encoder and diffusion model for better Chinese language guidance compatibility.

Training

Training was conducted over two stages:

  • Stage 1: Only the text encoder was trained for 80 hours using 8 x A100 GPUs.
  • Stage 2: Both the text encoder and diffusion model were trained for 100 hours, also on 8 x A100 GPUs. The training dataset included pairs with a CLIP Score greater than 0.2, derived from the Noah-Wukong and Zero datasets.

Guide: Running Locally

Basic Steps

  1. Install Dependencies:

    pip install torch diffusers
    
  2. Full Precision Usage:

    from diffusers import StableDiffusionPipeline
    
    pipe = StableDiffusionPipeline.from_pretrained("IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1").to("cuda")
    prompt = '小桥流水人家,Van Gogh style'
    image = pipe(prompt, guidance_scale=10).images[0]
    image.save("小桥.png")
    
  3. Half Precision Usage (FP16):

    from diffusers import StableDiffusionPipeline
    import torch
    torch.backends.cudnn.benchmark = True
    
    pipe = StableDiffusionPipeline.from_pretrained(
        "IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1", 
        torch_dtype=torch.float16
    )
    pipe.to('cuda')
    
    prompt = '小桥流水人家,Van Gogh style'
    image = pipe(prompt, guidance_scale=10.0).images[0]
    image.save("小桥.png")
    

Cloud GPUs

For optimal performance, it is recommended to use cloud-based GPUs such as NVIDIA A100 instances available on major cloud platforms like AWS, Google Cloud, or Azure.

License

The model is licensed under the CreativeML OpenRAIL-M license. Users are free to utilize and distribute the model, with the condition that they do not produce or distribute illegal or harmful content. Users are responsible for adhering to the license terms, which include redistributing the model with the same use restrictions and sharing a copy of the license with users. Full license details can be found here.

More Related APIs in Text To Image