Kolors diffusers
Kwai-KolorsIntroduction
Kolors is a large-scale text-to-image generation model developed by the Kuaishou Kolors team, utilizing latent diffusion techniques. It is trained on billions of text-image pairs, offering superior visual quality and semantic accuracy compared to other models. Kolors handles both English and Chinese inputs effectively and excels in generating content specific to Chinese contexts. More technical details are available in the technical report.
Architecture
Kolors leverages a diffusion model architecture, integrating advanced schedulers such as the EulerDiscreteScheduler and EDMDPMSolverMultistepScheduler to optimize image generation processes. This architecture allows for detailed photorealistic synthesis and supports both text-to-image and image-to-image transformations.
Training
The model is trained on a vast dataset of text-image pairs, emphasizing the accurate rendering of complex semantics and textual content in both English and Chinese. The training process focuses on enhancing visual quality and maintaining semantic fidelity.
Guide: Running Locally
-
Installation:
- Clone the
diffusers
library and install it:git clone https://github.com/huggingface/diffusers cd diffusers python3 setup.py install
- Clone the
-
Setup:
- Use the KolorsPipeline with a specific configuration:
import torch from diffusers import KolorsPipeline pipe = KolorsPipeline.from_pretrained( "Kwai-Kolors/Kolors-diffusers", torch_dtype=torch.float16, variant="fp16" ).to("cuda")
- Use the KolorsPipeline with a specific configuration:
-
Generate Images:
- Create an image from a prompt:
prompt = '一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着"可图"' image = pipe( prompt=prompt, negative_prompt="", guidance_scale=5.0, num_inference_steps=50, generator=torch.Generator(pipe.device).manual_seed(66), ).images[0] image.show()
- Create an image from a prompt:
-
Cloud GPUs:
- For optimal performance, consider using cloud-based GPUs like AWS EC2 GPU instances, Google Cloud GPUs, or Azure N-series VMs.
License
Kolors is open-sourced under the Apache-2.0 license for academic research. Commercial use requires registration through a questionnaire. Adherence to the open-source license is required to prevent misuse of the model. Despite compliance efforts, the model's output may not always be accurate or safe. The project disclaims legal responsibility for any issues arising from misuse or inaccuracies in the model's output. For citation, please refer to the work as outlined in the provided BibTeX entry.