Echo Mimic V2
BadToBestIntroduction
EchoMimicV2 is a project focused on creating striking, simplified, and semi-body human animations driven by audio inputs. Developed by the Terminal Technology Department at Ant Group, this project is a continuation of the EchoMimic series, aiming to enhance the realism and control of audio-driven animations.
Architecture
The EchoMimicV2 architecture comprises several key components, including models for denoising, reference unet, motion module, pose encoder, and audio processing. These components work together to convert audio inputs into realistic human animations, leveraging pretrained models and advanced processing techniques.
Training
EchoMimicV2 models are trained on datasets in both English and Mandarin Chinese. The training process utilizes powerful GPUs, including A100, RTX4090D, and V100, and is compatible with Python versions 3.8, 3.10, and 3.11. The project also provides EMTD dataset lists and processing scripts to facilitate training and experimentation.
Guide: Running Locally
- Clone the Repository:
git clone https://github.com/antgroup/echomimic_v2 cd echomimic_v2
- Set Up Python Environment:
- Recommended to use a conda environment:
conda create -n echomimic python=3.10 conda activate echomimic
- Recommended to use a conda environment:
- Install Dependencies:
pip install pip -U pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu124 pip install torchao --index-url https://download.pytorch.org/whl/nightly/cu124 pip install -r requirements.txt pip install --no-deps facenet_pytorch==2.6.0
- Download FFMPEG:
- Download and set the path:
export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
- Download and set the path:
- Download Pretrained Weights:
git lfs install git clone https://huggingface.co/BadToBest/EchoMimicV2 pretrained_weights
- Run Inference:
- Start Gradio UI:
python app.py
- Run Python script:
python infer.py --config='./configs/prompts/infer.yaml'
- Start Gradio UI:
For optimal performance, it is recommended to use cloud GPUs such as NVIDIA A100 or RTX4090D.
License
This project is intended for academic research only. Users are responsible for their actions while utilizing the generative model and must adhere to ethical and legal standards. The contributors disclaim any responsibility for user-generated content.