Introduction

EchoMimic is a model designed for creating lifelike audio-driven portrait animations. It employs editable landmark conditioning to enhance animation realism. The EchoMimic series includes EchoMimicV1 and EchoMimicV2, with the latter aimed at simplified and semi-body human animation.

Architecture

EchoMimic's architecture consists of several components, including denoising and reference UNet models, a motion module, and a face locator. These components work together to process audio inputs and drive the animation of portraits based on the provided landmarks.

Training

The EchoMimic models are trained on various datasets to enhance pose control and animation accuracy. Pretrained models are available for both English and Mandarin Chinese, and efforts are ongoing to improve singing performance and develop a high-resolution Chinese-based talking head dataset.

Guide: Running Locally

Basic Steps

  1. Clone the Repository

    git clone https://github.com/BadToBest/EchoMimic
    cd EchoMimic
    
  2. Set Up Python Environment

    • Recommended Python versions are 3.8, 3.10, or 3.11.
    • Create a Conda environment:
      conda create -n echomimic python=3.8
      conda activate echomimic
      
    • Install required packages:
      pip install -r requirements.txt
      
  3. Download FFMPEG-Static

    • Download from FFMPEG website
    • Set the path:
      export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
      
  4. Download Pretrained Weights

    git lfs install
    git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights
    
  5. Run Inference

    • For audio-driven animation:
      python -u infer_audio2vid.py
      
    • For motion alignment:
      python -u demo_motion_sync.py
      

Cloud GPUs

  • Suggested GPUs include A100 (80G), RTX4090D (24G), and V100 (16G) for optimal performance.

License

EchoMimic is intended for academic research. All users are responsible for ensuring their use of the model complies with ethical and legal standards. The project contributors disclaim any responsibility for user-generated content.

More Related APIs