glide base
fusingIntroduction
GLIDE is a model designed for photorealistic image generation and editing using text-guided diffusion models. It explores the effectiveness of diffusion models in text-conditional image synthesis, comparing CLIP guidance and classifier-free guidance. Human evaluations favored classifier-free guidance for both photorealism and caption similarity. The model excels in generating and editing images, outperforming DALL-E under certain conditions.
Architecture
GLIDE utilizes a text-conditional diffusion model with 3.5 billion parameters. The architecture supports classifier-free guidance, which enhances image quality by balancing diversity and fidelity. This model is capable of fine-tuning for tasks like image inpainting, allowing detailed image editing based on text prompts.
Training
Training involves using diffusion techniques paired with guidance strategies to improve image synthesis. The model is trained to optimize photorealism and textual relevance, and it can be fine-tuned for specific tasks like inpainting. The training process leverages large-scale datasets to refine the model's capabilities in generating high-quality images.
Guide: Running Locally
To run GLIDE locally, follow these steps:
- Install Required Libraries:
pip install diffusers torch PIL
- Load and Use the Model:
import torch from diffusers import DiffusionPipeline import PIL.Image model_id = "fusing/glide-base" pipeline = DiffusionPipeline.from_pretrained(model_id) img = pipeline("a crayon drawing of a corgi") img = img.squeeze(0) img = ((img + 1)*127.5).round().clamp(0, 255).to(torch.uint8).cpu().numpy() image_pil = PIL.Image.fromarray(img) image_pil.save("test.png")
- Cloud GPUs: For improved performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure to handle the computational demands of the model.
License
GLIDE is licensed under the Apache-2.0 License, allowing for use, modification, and distribution under specified terms.