Taiyi Stable Diffusion 1 B Chinese v0.1

IDEA-CCNL

Introduction

Taiyi-Stable-Diffusion-1B-Chinese is the first open-source Chinese Stable Diffusion model, trained on 20 million filtered Chinese image-text pairs. It supports the generation of images from text prompts and is available for use via a Gradio Web UI.

Architecture

The model utilizes the Noah-Wukong and Zero datasets for pre-training, using image-text pairs scored by the Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese model. It maintains the original generative capabilities of Stable Diffusion while aligning with Chinese concepts by only training the text encoder and freezing other parts of the stable-diffusion-v1-4 model.

Training

Training was conducted using 32 A100 GPUs over approximately 100 hours. The training data consisted of image-text pairs with a CLIP Score greater than 0.2. The model is a preliminary version, with plans for continuous updates and optimization.

Guide: Running Locally

Steps

  1. Install the required libraries:

    pip install torch diffusers
    
  2. Load the model in full precision:

    from diffusers import StableDiffusionPipeline
    
    pipe = StableDiffusionPipeline.from_pretrained("IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1").to("cuda")
    prompt = '飞流直下三千尺,油画'
    image = pipe(prompt, guidance_scale=7.5).images[0]
    image.save("飞流.png")
    
  3. For half precision (FP16) for faster inference:

    import torch
    from diffusers import StableDiffusionPipeline
    
    pipe = StableDiffusionPipeline.from_pretrained("IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1", torch_dtype=torch.float16)
    pipe.to('cuda')
    
    prompt = '飞流直下三千尺,油画'
    image = pipe(prompt, guidance_scale=7.5).images[0]
    image.save("飞流.png")
    

Cloud GPUs

Consider using cloud-based GPU services such as AWS, Google Cloud, or Azure to leverage powerful computational resources for model inference.

License

The model is licensed under the CreativeML OpenRAIL-M license, which allows for open access, redistribution of weights, and commercial use with certain restrictions. Users are accountable for ensuring outputs do not violate the license provisions. The full license can be reviewed on the Hugging Face website.

More Related APIs in Text To Image