hallo2
fudan-generative-aiHALLO2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Introduction
HALLO2 is a project developed by researchers from Fudan University, Baidu Inc, and Nanjing University. It focuses on creating long-duration and high-resolution animations of portrait images driven by audio inputs. The technology aims to animate images based on audio cues, producing realistic and detailed animations.
Architecture
The framework of HALLO2 utilizes a combination of models and technologies, including denoising UNet, face locators, and image & audio projectors. The architecture integrates various pretrained models for tasks such as face analysis, audio processing, and animation.
Training
Training for HALLO2 is split into two parts: long-duration animation and high-resolution animation. For long-duration animation, the training data involves talking-face videos meeting specific face orientation and size criteria. The training process uses distributed computing frameworks like Accelerate for efficient training across multiple nodes. High-resolution animation training uses the VFHQ dataset, with models trained using PyTorch's distributed launch capabilities.
Guide: Running Locally
Basic Steps
- Set Up Environment:
- Use Ubuntu 20.04/22.04 with Cuda 11.8.
- Create a conda environment and install necessary packages:
conda create -n hallo python=3.10 conda activate hallo pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt apt-get install ffmpeg
- Download Pretrained Models:
- Clone the models from the HuggingFace repository:
git lfs install git clone https://huggingface.co/fudan-generative-ai/hallo2 pretrained_models
- Clone the models from the HuggingFace repository:
- Prepare Inference Data:
- Ensure source images are square with the face occupying 50-70% and facing forward.
- Driving audio must be in WAV format and in English.
- Run Inference:
- Execute inference scripts for long-duration or high-resolution animations:
python scripts/inference_long.py --config ./configs/inference/long.yaml python scripts/video_sr.py --input_path [input_video] --output_path [output_dir]
- Execute inference scripts for long-duration or high-resolution animations:
Cloud GPUs
For enhanced performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure to handle computationally intensive tasks.
License
HALLO2 is released under the MIT License. Note that some components, such as the high-resolution animation feature, have specific license requirements (S-Lab License 1.0) that must be respected.