Introduction

Kolors is a large-scale text-to-image generation model using latent diffusion, developed by the Kuaishou Kolors team. It is trained on a vast dataset of text-image pairs, excelling in visual quality and complex semantic understanding for both English and Chinese texts. The model's capabilities in generating Chinese-specific content are particularly notable. For further technical details, refer to the technical report.

Architecture

Kolors utilizes latent diffusion to achieve photorealistic text-to-image synthesis. It supports both Chinese and English language inputs and excels in rendering detailed and semantically complex images.

Training

The model was trained on billions of text-image pairs, ensuring its high performance across diverse visual and linguistic tasks. The training methodology emphasizes both open-source and proprietary advancements in visual quality, semantic accuracy, and text rendering.

Guide: Running Locally

Requirements

  • Python 3.8 or later
  • PyTorch 1.13.1 or later
  • Transformers 4.26.1 or later
  • Recommended: CUDA 11.7 or later

Steps

  1. Clone the Repository and Install Dependencies

    apt-get install git-lfs
    git clone https://github.com/Kwai-Kolors/Kolors
    cd Kolors
    conda create --name kolors python=3.8
    conda activate kolors
    pip install -r requirements.txt
    python3 setup.py install
    
  2. Download Weights

    huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors
    

    Or

    git lfs clone https://huggingface.co/Kwai-Kolors/Kolors weights/Kolors
    
  3. Run Inference

    python3 scripts/sample.py "一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着“可图”"
    

    The generated image will be saved to scripts/outputs/sample_test.jpg.

Cloud GPUs

For optimal performance, consider using cloud services that offer GPUs, such as AWS, Google Cloud, or Azure.

License

Kolors is open-sourced for academic research under the Apache-2.0 license. For commercial use, a formal registration process is required. Users must adhere strictly to the license terms, ensuring the model is not used for harmful or unauthorized purposes. Despite efforts to ensure data safety and compliance, the model's probabilistic nature means outputs may not always be accurate or secure. The project disclaims legal responsibility for any misuse or unintended outcomes.

More Related APIs in Text To Image