Kolors
Kwai-KolorsIntroduction
Kolors is a large-scale text-to-image generation model using latent diffusion, developed by the Kuaishou Kolors team. It is trained on a vast dataset of text-image pairs, excelling in visual quality and complex semantic understanding for both English and Chinese texts. The model's capabilities in generating Chinese-specific content are particularly notable. For further technical details, refer to the technical report.
Architecture
Kolors utilizes latent diffusion to achieve photorealistic text-to-image synthesis. It supports both Chinese and English language inputs and excels in rendering detailed and semantically complex images.
Training
The model was trained on billions of text-image pairs, ensuring its high performance across diverse visual and linguistic tasks. The training methodology emphasizes both open-source and proprietary advancements in visual quality, semantic accuracy, and text rendering.
Guide: Running Locally
Requirements
- Python 3.8 or later
- PyTorch 1.13.1 or later
- Transformers 4.26.1 or later
- Recommended: CUDA 11.7 or later
Steps
-
Clone the Repository and Install Dependencies
apt-get install git-lfs git clone https://github.com/Kwai-Kolors/Kolors cd Kolors conda create --name kolors python=3.8 conda activate kolors pip install -r requirements.txt python3 setup.py install
-
Download Weights
huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors
Or
git lfs clone https://huggingface.co/Kwai-Kolors/Kolors weights/Kolors
-
Run Inference
python3 scripts/sample.py "一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着“可图”"
The generated image will be saved to
scripts/outputs/sample_test.jpg
.
Cloud GPUs
For optimal performance, consider using cloud services that offer GPUs, such as AWS, Google Cloud, or Azure.
License
Kolors is open-sourced for academic research under the Apache-2.0 license. For commercial use, a formal registration process is required. Users must adhere strictly to the license terms, ensuring the model is not used for harmful or unauthorized purposes. Despite efforts to ensure data safety and compliance, the model's probabilistic nature means outputs may not always be accurate or secure. The project disclaims legal responsibility for any misuse or unintended outcomes.