S D3.5 Large I P Adapter
InstantXIntroduction
The SD3.5-LARGE-IP-ADAPTER is an IP-Adapter developed for the SD3.5-Large model by the InstantX Team. It supports text-to-image generation, leveraging the Stable Diffusion 3 Pipeline for enhanced image creation.
Architecture
The IP-Adapter integrates new layers into all 38 blocks of the model. It uses the google/siglip-so400m-patch14-384
for image encoding due to its superior performance, and employs a TimeResampler to project images. The model sets the number of image tokens to 64.
Training
While specific training details are not provided, the architecture suggests that enhancements focus on integrating additional layers and utilizing advanced encoding techniques to improve image generation quality.
Guide: Running Locally
To run the SD3.5-Large-IP-Adapter locally, follow these steps:
- Install Necessary Libraries: Ensure that you have
torch
andPIL
(Python Imaging Library) installed. - Download Model Files: Obtain the model, IP-Adapter, and image encoder from the specified paths.
- Load Model: Use the provided code snippet to load and initialize the model:
import torch from PIL import Image from models.transformer_sd3 import SD3Transformer2DModel from pipeline_stable_diffusion_3_ipa import StableDiffusion3Pipeline model_path = 'stabilityai/stable-diffusion-3.5-large' ip_adapter_path = './ip-adapter.bin' image_encoder_path = "google/siglip-so400m-patch14-384" transformer = SD3Transformer2DModel.from_pretrained( model_path, subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = StableDiffusion3Pipeline.from_pretrained( model_path, transformer=transformer, torch_dtype=torch.bfloat16 ).to("cuda") pipe.init_ipadapter( ip_adapter_path=ip_adapter_path, image_encoder_path=image_encoder_path, nb_token=64, )
- Generate Images: Use the pipeline to generate images with specific prompts and settings:
ref_img = Image.open('./assets/1.jpg').convert('RGB') image = pipe( width=1024, height=1024, prompt='a cat', negative_prompt="lowres, low quality, worst quality", num_inference_steps=24, guidance_scale=5.0, generator=torch.Generator("cuda").manual_seed(42), clip_image=ref_img, ipadapter_scale=0.5, ).images[0] image.save('./result.jpg')
- Environment: It is recommended to utilize a cloud GPU for enhanced performance, such as those offered by AWS or Google Cloud.
License
The model is released under the stabilityai-ai-community license. For more details, refer to the license link. All rights are reserved.