i2vgen xl
ali-vilabIntroduction
VGen is an open-source video synthesis codebase developed by the Tongyi Lab of Alibaba Group, featuring advanced video generative models. It supports multiple methods for video synthesis, including I2VGen-XL and VideoComposer, among others. VGen facilitates high-quality video generation from text, images, and other inputs with a variety of video generation tools.
Architecture
VGen's architecture is designed for expandability, completeness, and excellent performance. It includes powerful pre-trained models for various tasks, supporting video generation with state-of-the-art capabilities. The codebase is modular, allowing easy management of experiments by integrating components like ENGINE, MODEL, DATASETS, and more.
Training
To train a text-to-video model using VGen, users should execute distributed training commands as specified. Configuration files like t2v_train.yaml
enable customization of data and diffusion settings. Pre-trained models can be used for initialization, and results are saved for review. After training, inference can be performed to generate videos.
Guide: Running Locally
-
Installation:
conda create -n vgen python=3.8 conda activate vgen pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
-
Clone Repository:
git clone https://github.com/damo-vilab/i2vgen-xl.git cd i2vgen-xl
-
Dataset: Utilize the provided demo dataset for testing.
-
Training:
python train_net.py --cfg configs/t2v_train.yaml
-
Inference:
python inference.py --cfg configs/t2v_infer.yaml
-
Running I2VGen-XL:
- Download the model and test data:
!pip install modelscope from modelscope.hub.snapshot_download import snapshot_download model_dir = snapshot_download('damo/I2VGen-XL', cache_dir='models/', revision='v1.0.0')
- Execute:
python inference.py --cfg configs/i2vgen_xl_infer.yaml
- Download the model and test data:
Cloud GPUs: For optimal performance, utilizing cloud GPUs such as those from AWS or Google Cloud is recommended.
License
The project is licensed under the MIT License. The model is intended for research and non-commercial use only, utilizing datasets like WebVid-10M and LAION-400M.