Cog Video X Fun V1.5 5b In P

alibaba-pai

Introduction

CogVideoX-Fun is an advanced pipeline based on the modified CogVideoX architecture, designed for generating AI images and videos. It provides flexible generation conditions, allowing users to predict and train their own baseline and LoRA models. The pipeline supports various resolutions and video lengths, and it incorporates new features such as reward backpropagation to align video outputs with human preferences.

Architecture

The architecture of CogVideoX-Fun leverages the Diffusion Transformer model. It allows for the generation of videos with varying resolutions and frame lengths. The system supports various control models, including Canny, Depth, Pose, and MLSD, enhancing its flexibility and applicability in different scenarios.

Training

Training involves data preprocessing and Video DiT training. Users can preprocess long video clips, clean, and describe data, preparing it for model training. Training scripts are provided, and the data can be configured with either relative or absolute paths. The model supports training with personalized datasets to create video synthesis models.

Guide: Running Locally

  1. Environment Check:

    • Windows:
      • OS: Windows 10
      • Python: 3.10 & 3.11
      • PyTorch: 2.2.0
      • CUDA: 11.8 & 12.1
      • CUDNN: 8+
      • GPU: Nvidia-3060 12G & Nvidia-3090 24G
    • Linux:
      • OS: Ubuntu 20.04, CentOS
      • Python: 3.10 & 3.11
      • PyTorch: 2.2.0
      • CUDA: 11.8 & 12.1
      • CUDNN: 8+
      • GPU: Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
  2. Weight Placement:

    • Organize weights under the models directory as specified.
  3. Cloud GPUs:

    • Consider using cloud GPUs for intensive tasks. Alibaba Cloud DSW offers free GPU time, which can be beneficial for running CogVideoX-Fun quickly.

License

This project is licensed under the Apache License (Version 2.0). The CogVideoX-2B model, including its corresponding Transformers and VAE modules, is released under the Apache 2.0 license. The CogVideoX-5B model is released under the CogVideoX license.

More Related APIs