kandinsky 3
kandinsky-communityIntroduction
Kandinsky 3.0 is an open-source text-to-image diffusion model developed as an enhancement of the Kandinsky2-x model family. It is designed to generate images with a focus on Russian cultural elements. This version improves text comprehension and visual quality by enlarging the text encoder and the Diffusion U-Net models.
Architecture
The model architecture comprises three main components:
- Text Encoder Flan-UL2: An 8.6 billion parameter encoder.
- Latent Diffusion U-Net: A 3 billion parameter network.
- MoVQ Encoder/Decoder: Comprising 267 million parameters.
Training
Two models are released:
- Base Model: Trained over 2 million steps using 400 A100 GPUs.
- Inpainting Model: Initialized from the base model's final checkpoint and further trained for 250,000 steps on 300 A100 GPUs.
Guide: Running Locally
Installation
To run the Kandinsky 3.0 model locally, you need to install the following libraries:
pip install git+https://github.com/huggingface/diffusers.git
pip install --upgrade transformers accelerate
Text-to-Image Generation
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]
Image-to-Image Generation
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]
Cloud GPUs
For optimal performance, consider using cloud-based GPU solutions like AWS, Google Cloud, or Azure.
License
Kandinsky 3.0 is licensed under the Apache 2.0 License, allowing for broad use and modification.