Hunyuan Video tuxemons

a-r-r-o-w

Introduction

HunyuanVideo-tuxemons is a text-to-video model designed to generate videos based on descriptive prompts. The model is a fine-tuned version of the HunyuanVideo model, utilizing the LoRA (Low-Rank Adaptation) technique to enhance its capabilities in creating visually detailed and stylistically consistent video outputs.

Architecture

The model is built on the HunyuanVideo architecture and fine-tuned using the LoRA technique. The fine-tuning was performed with 250 images from the diffusers/tuxemon dataset and additional samples from a Flux LoRA trained on the same dataset. The model leverages the Diffusers library for video generation, utilizing components like the HunyuanVideoPipeline and HunyuanVideoTransformer3DModel.

Training

The fine-tuning process involved the following parameters:

  • Learning rate: 1e-5
  • Training steps: 10,000
  • Video resolution buckets: 1x768x768, 1x512x512
  • Rank: 128
  • LoRA alpha: 128
  • Optimizer: AdamW with a weight decay of 0.01
  • Flow weighting scheme: logit_normal
  • Flow shift: 7.0

Guide: Running Locally

To run the model locally:

  1. Install Requirements: Ensure that you have the Diffusers library installed. Use the command:

    pip install diffusers
    
  2. Load the Model:

    import torch
    from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
    from diffusers.utils import export_to_video
    
    model_id = "hunyuanvideo-community/HunyuanVideo"
    transformer = HunyuanVideoTransformer3DModel.from_pretrained(
        model_id, subfolder="transformer", torch_dtype=torch.bfloat16
    )
    pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)
    pipe.vae.enable_tiling()
    pipe.to("cuda")
    
  3. Load LoRA Weights:

    pipe.load_lora_weights("a-r-r-o-w/HunyuanVideo-tuxemons", adapter_name="hunyuanvideo-lora")
    pipe.set_adapters("hunyuanvideo-lora", 1.2)
    
  4. Generate Video:

    output = pipe(
        prompt="Style of snomexut, a cat-like Tuxemon creature walks in alien-world grass, and observes its surroundings.",
        height=768,
        width=768,
        num_frames=33,
        num_inference_steps=30,
        generator=torch.Generator().manual_seed(73),
    ).frames[0]
    export_to_video(output, "output-tuxemon.mp4", fps=15)
    

For optimal performance, it is recommended to use a cloud GPU service such as AWS, GCP, or Azure to handle the computational load required for video generation.

License

HunyuanVideo-tuxemons is distributed under the Apache License 2.0. This permits users to freely use, modify, and distribute the software, provided that proper attribution is given and any modifications are also shared under the same license. For more details, refer to the license documentation provided with the model.

More Related APIs in Text To Video