Stable Animator

FrancisRing

Introduction

StableAnimator is a high-quality, identity-preserving human image animation framework. It produces high-fidelity videos directly from reference images and a sequence of poses without post-processing. The framework is designed to maintain identity consistency, employing novel techniques in both training and inference phases. It utilizes a unique distribution-aware ID Adapter and a Hamilton-Jacobi-Bellman equation-based optimization to enhance video quality.

Architecture

StableAnimator employs a video diffusion model enhanced with specific modules for identity consistency. It begins with computing image and face embeddings, refined through a global content-aware Face Encoder. The model includes a distribution-aware ID Adapter to align identity features and avoid interference from temporal layers. During inference, it incorporates a Hamilton-Jacobi-Bellman equation to improve face quality within the diffusion denoising process.

Training

The training process of StableAnimator involves carefully crafted modules to ensure identity preservation in the generated animations. Specific codes for data preprocessing, including human skeleton and face mask extraction, are provided. These preprocessing steps are crucial for preparing the data used in training the model.

Guide: Running Locally

  1. Environment Setup

    • Install the necessary Python packages:
      pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
      pip install torch==2.5.1+cu124 xformers --index-url https://download.pytorch.org/whl/cu124
      pip install -r requirements.txt
      
  2. Download Weights

    • Clone the StableAnimator repository and download the model weights:
      git clone https://huggingface.co/FrancisRing/StableAnimator checkpoints
      
  3. Organize Files

    • Ensure the downloaded weights are organized correctly in directories as shown in the provided directory structure.
  4. Run Inference

    • Use the provided script to perform inference:
      bash command_basic_infer.sh
      
  5. Generate Video

    • Use ffmpeg to convert frames into a high-quality MP4 file:
      cd animated_images
      ffmpeg -framerate 20 -i frame_%d.png -c:v libx264 -crf 10 -pix_fmt yuv420p /path/animation.mp4
      
  6. Gradio Interface

    • Launch the Gradio interface using:
      python app.py
      

Cloud GPUs

  • Consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle the high computational demands, especially for larger models.

License

StableAnimator is released under the Apache-2.0 license, which allows for both commercial and non-commercial use, distribution, and modification, provided that the license terms are met.

More Related APIs in Image To Video