Stable Animator
FrancisRingIntroduction
StableAnimator is a high-quality, identity-preserving human image animation framework. It produces high-fidelity videos directly from reference images and a sequence of poses without post-processing. The framework is designed to maintain identity consistency, employing novel techniques in both training and inference phases. It utilizes a unique distribution-aware ID Adapter and a Hamilton-Jacobi-Bellman equation-based optimization to enhance video quality.
Architecture
StableAnimator employs a video diffusion model enhanced with specific modules for identity consistency. It begins with computing image and face embeddings, refined through a global content-aware Face Encoder. The model includes a distribution-aware ID Adapter to align identity features and avoid interference from temporal layers. During inference, it incorporates a Hamilton-Jacobi-Bellman equation to improve face quality within the diffusion denoising process.
Training
The training process of StableAnimator involves carefully crafted modules to ensure identity preservation in the generated animations. Specific codes for data preprocessing, including human skeleton and face mask extraction, are provided. These preprocessing steps are crucial for preparing the data used in training the model.
Guide: Running Locally
-
Environment Setup
- Install the necessary Python packages:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 pip install torch==2.5.1+cu124 xformers --index-url https://download.pytorch.org/whl/cu124 pip install -r requirements.txt
- Install the necessary Python packages:
-
Download Weights
- Clone the StableAnimator repository and download the model weights:
git clone https://huggingface.co/FrancisRing/StableAnimator checkpoints
- Clone the StableAnimator repository and download the model weights:
-
Organize Files
- Ensure the downloaded weights are organized correctly in directories as shown in the provided directory structure.
-
Run Inference
- Use the provided script to perform inference:
bash command_basic_infer.sh
- Use the provided script to perform inference:
-
Generate Video
- Use
ffmpeg
to convert frames into a high-quality MP4 file:cd animated_images ffmpeg -framerate 20 -i frame_%d.png -c:v libx264 -crf 10 -pix_fmt yuv420p /path/animation.mp4
- Use
-
Gradio Interface
- Launch the Gradio interface using:
python app.py
- Launch the Gradio interface using:
Cloud GPUs
- Consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle the high computational demands, especially for larger models.
License
StableAnimator is released under the Apache-2.0 license, which allows for both commercial and non-commercial use, distribution, and modification, provided that the license terms are met.