Echo Mimic V2

BadToBest

Introduction

EchoMimicV2 is a project focused on creating striking, simplified, and semi-body human animations driven by audio inputs. Developed by the Terminal Technology Department at Ant Group, this project is a continuation of the EchoMimic series, aiming to enhance the realism and control of audio-driven animations.

Architecture

The EchoMimicV2 architecture comprises several key components, including models for denoising, reference unet, motion module, pose encoder, and audio processing. These components work together to convert audio inputs into realistic human animations, leveraging pretrained models and advanced processing techniques.

Training

EchoMimicV2 models are trained on datasets in both English and Mandarin Chinese. The training process utilizes powerful GPUs, including A100, RTX4090D, and V100, and is compatible with Python versions 3.8, 3.10, and 3.11. The project also provides EMTD dataset lists and processing scripts to facilitate training and experimentation.

Guide: Running Locally

  1. Clone the Repository:
    git clone https://github.com/antgroup/echomimic_v2
    cd echomimic_v2
    
  2. Set Up Python Environment:
    • Recommended to use a conda environment:
      conda create -n echomimic python=3.10
      conda activate echomimic
      
  3. Install Dependencies:
    pip install pip -U
    pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu124
    pip install torchao --index-url https://download.pytorch.org/whl/nightly/cu124
    pip install -r requirements.txt
    pip install --no-deps facenet_pytorch==2.6.0
    
  4. Download FFMPEG:
    • Download and set the path:
      export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
      
  5. Download Pretrained Weights:
    git lfs install
    git clone https://huggingface.co/BadToBest/EchoMimicV2 pretrained_weights
    
  6. Run Inference:
    • Start Gradio UI:
      python app.py
      
    • Run Python script:
      python infer.py --config='./configs/prompts/infer.yaml'
      

For optimal performance, it is recommended to use cloud GPUs such as NVIDIA A100 or RTX4090D.

License

This project is intended for academic research only. Users are responsible for their actions while utilizing the generative model and must adhere to ethical and legal standards. The contributors disclaim any responsibility for user-generated content.

More Related APIs