cartoon control lr_1e 4 wd_1e 4 gs_10.0 cd_0.1

sayakpaul

Introduction

The CARTOON-CONTROL-LR_1E-4-WD_1E-4-GS_10.0-CD_0.1 is a model designed for generating cartoonized images using a new type of conditioning. The model utilizes Flux control weights trained on the black-forest-labs/FLUX.1-dev base model and is developed using the instruction-tuning-sd/cartoonization dataset.

Architecture

This model is built on the black-forest-labs/FLUX.1-dev architecture using the diffusers library. It involves a Flux Control Pipeline to handle the process of transforming images based on specified prompts and control inputs.

Training

The model was trained using the instruction-tuning-sd/cartoonization dataset. Training details, including logs, can be accessed through the provided WandB logs and training examples.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Required Libraries: Ensure you have the diffusers library and other dependencies installed.
  2. Import Necessary Modules:
    from diffusers import FluxTransformer2DModel, FluxControlPipeline
    from diffusers.utils import load_image
    import torch 
    
  3. Load the Model:
    path = "sayakpaul/cartoon-control-lr_1e-4-wd_1e-4-gs_10.0-cd_0.1"
    transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
    pipe = FluxControlPipeline.from_pretrained(
      "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
    ).to("cuda")
    
  4. Generate Images:
    prompt = "Generate a cartoonized version of the image"
    url = "https://huggingface.co/sayakpaul/cartoon-control-lr_1e-4-wd_1e-4-gs_10.0-cd_0.1/resolve/main/taj.jpg"
    
    image = load_image(url).resize((1024, 1024))
    gen_image = pipe(
        prompt=prompt,
        control_image=image,
        guidance_scale=10.,
        num_inference_steps=50,
        generator=torch.manual_seed(0),
        max_sequence_length=512,
    ).images[0]
    gen_image.save("output.png")
    

Cloud GPUs such as those available from AWS, Google Cloud, or Azure are recommended for efficient processing.

License

The model is licensed under terms provided here. Ensure compliance with these terms when using the model.

More Related APIs in Text To Image