Kolors diffusers

Kwai-Kolors

Introduction

Kolors is a large-scale text-to-image generation model developed by the Kuaishou Kolors team, utilizing latent diffusion techniques. It is trained on billions of text-image pairs, offering superior visual quality and semantic accuracy compared to other models. Kolors handles both English and Chinese inputs effectively and excels in generating content specific to Chinese contexts. More technical details are available in the technical report.

Architecture

Kolors leverages a diffusion model architecture, integrating advanced schedulers such as the EulerDiscreteScheduler and EDMDPMSolverMultistepScheduler to optimize image generation processes. This architecture allows for detailed photorealistic synthesis and supports both text-to-image and image-to-image transformations.

Training

The model is trained on a vast dataset of text-image pairs, emphasizing the accurate rendering of complex semantics and textual content in both English and Chinese. The training process focuses on enhancing visual quality and maintaining semantic fidelity.

Guide: Running Locally

  1. Installation:

    • Clone the diffusers library and install it:
      git clone https://github.com/huggingface/diffusers
      cd diffusers
      python3 setup.py install
      
  2. Setup:

    • Use the KolorsPipeline with a specific configuration:
      import torch
      from diffusers import KolorsPipeline
      
      pipe = KolorsPipeline.from_pretrained(
          "Kwai-Kolors/Kolors-diffusers", 
          torch_dtype=torch.float16, 
          variant="fp16"
      ).to("cuda")
      
  3. Generate Images:

    • Create an image from a prompt:
      prompt = '一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着"可图"'
      image = pipe(
          prompt=prompt,
          negative_prompt="",
          guidance_scale=5.0,
          num_inference_steps=50,
          generator=torch.Generator(pipe.device).manual_seed(66),
      ).images[0]
      image.show()
      
  4. Cloud GPUs:

    • For optimal performance, consider using cloud-based GPUs like AWS EC2 GPU instances, Google Cloud GPUs, or Azure N-series VMs.

License

Kolors is open-sourced under the Apache-2.0 license for academic research. Commercial use requires registration through a questionnaire. Adherence to the open-source license is required to prevent misuse of the model. Despite compliance efforts, the model's output may not always be accurate or safe. The project disclaims legal responsibility for any issues arising from misuse or inaccuracies in the model's output. For citation, please refer to the work as outlined in the provided BibTeX entry.

More Related APIs