Echo Mimic
BadToBestIntroduction
EchoMimic is a model designed for creating lifelike audio-driven portrait animations. It employs editable landmark conditioning to enhance animation realism. The EchoMimic series includes EchoMimicV1 and EchoMimicV2, with the latter aimed at simplified and semi-body human animation.
Architecture
EchoMimic's architecture consists of several components, including denoising and reference UNet models, a motion module, and a face locator. These components work together to process audio inputs and drive the animation of portraits based on the provided landmarks.
Training
The EchoMimic models are trained on various datasets to enhance pose control and animation accuracy. Pretrained models are available for both English and Mandarin Chinese, and efforts are ongoing to improve singing performance and develop a high-resolution Chinese-based talking head dataset.
Guide: Running Locally
Basic Steps
-
Clone the Repository
git clone https://github.com/BadToBest/EchoMimic cd EchoMimic
-
Set Up Python Environment
- Recommended Python versions are 3.8, 3.10, or 3.11.
- Create a Conda environment:
conda create -n echomimic python=3.8 conda activate echomimic
- Install required packages:
pip install -r requirements.txt
-
Download FFMPEG-Static
- Download from FFMPEG website
- Set the path:
export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
-
Download Pretrained Weights
git lfs install git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights
-
Run Inference
- For audio-driven animation:
python -u infer_audio2vid.py
- For motion alignment:
python -u demo_motion_sync.py
- For audio-driven animation:
Cloud GPUs
- Suggested GPUs include A100 (80G), RTX4090D (24G), and V100 (16G) for optimal performance.
License
EchoMimic is intended for academic research. All users are responsible for ensuring their use of the model complies with ethical and legal standards. The project contributors disclaim any responsibility for user-generated content.