Introduction
Ruyi-Mini-7B is an open-source image-to-video generation model developed by CreateAI. It produces video frames from an input image, offering resolutions from 360p to 720p and a maximum duration of 5 seconds. The model supports various aspect ratios and provides enhanced motion and camera control for creative video generation. It is licensed under Apache 2.0.

Architecture
Ruyi-Mini-7B features a robust architecture with approximately 7.1 billion parameters, derived from the EasyAnimate V4 model and the HunyuanDiT transformer module. The architecture includes:

  1. Casual VAE Module: Compresses and decompresses video, reducing spatial resolution to 1/8 and temporal resolution to 1/4.
  2. Diffusion Transformer Module: Utilizes 3D full attention and incorporates 2D Normalized-RoPE for spatial dimensions, sin-cos position embedding for temporal dimensions, and DDPM for training.
  3. CLIP Model: Extracts semantic features from the input image to guide video generation through cross-attention.

Training
The training process consists of four phases:

  • Phase 1: Pre-training with ~200M video clips and ~30M images at 256-resolution, using a batch size of 4096 for 350,000 iterations.
  • Phase 2: Fine-tuning with ~60M video clips for multi-scale resolutions (384–512), using a batch size of 1024 for 60,000 iterations.
  • Phase 3: High-quality fine-tuning with ~20M video clips and ~8M images for resolutions between 384–1024, with dynamic batch sizes for 10,000 iterations.
  • Phase 4: Image-to-video training with ~10M curated high-quality video clips, using dynamic batch sizes for ~10,000 iterations.

Model Stats Number

  • Parameters: ~7.1 billion
  • Video Resolutions: 360p to 720p
  • Maximum Video Duration: 5 seconds

Guide: Running Locally

  1. Clone the repository and install dependencies:
    git clone https://github.com/IamCreateAI/Ruyi-Models
    cd Ruyi-Models
    pip install -r requirements.txt
    
  2. Run the model using Python:
    python3 predict_i2v.py
    
  3. Alternatively, use the ComfyUI wrapper available in the GitHub repository.

For optimal performance, consider using cloud GPUs such as NVIDIA A100 or RTX4090, especially for high-resolution video generation.

License
Ruyi-Mini-7B is released under the Apache 2.0 license, allowing for both personal and commercial use with minimal restrictions.

More Related APIs in Image To Video