Ruyi Mini 7 B
IamCreateAIIntroduction
Ruyi-Mini-7B is an open-source image-to-video generation model developed by CreateAI. It produces video frames from an input image, offering resolutions from 360p to 720p and a maximum duration of 5 seconds. The model supports various aspect ratios and provides enhanced motion and camera control for creative video generation. It is licensed under Apache 2.0.
Architecture
Ruyi-Mini-7B features a robust architecture with approximately 7.1 billion parameters, derived from the EasyAnimate V4 model and the HunyuanDiT transformer module. The architecture includes:
- Casual VAE Module: Compresses and decompresses video, reducing spatial resolution to 1/8 and temporal resolution to 1/4.
- Diffusion Transformer Module: Utilizes 3D full attention and incorporates 2D Normalized-RoPE for spatial dimensions, sin-cos position embedding for temporal dimensions, and DDPM for training.
- CLIP Model: Extracts semantic features from the input image to guide video generation through cross-attention.
Training
The training process consists of four phases:
- Phase 1: Pre-training with ~200M video clips and ~30M images at 256-resolution, using a batch size of 4096 for 350,000 iterations.
- Phase 2: Fine-tuning with ~60M video clips for multi-scale resolutions (384–512), using a batch size of 1024 for 60,000 iterations.
- Phase 3: High-quality fine-tuning with ~20M video clips and ~8M images for resolutions between 384–1024, with dynamic batch sizes for 10,000 iterations.
- Phase 4: Image-to-video training with ~10M curated high-quality video clips, using dynamic batch sizes for ~10,000 iterations.
Model Stats Number
- Parameters: ~7.1 billion
- Video Resolutions: 360p to 720p
- Maximum Video Duration: 5 seconds
Guide: Running Locally
- Clone the repository and install dependencies:
git clone https://github.com/IamCreateAI/Ruyi-Models cd Ruyi-Models pip install -r requirements.txt
- Run the model using Python:
python3 predict_i2v.py
- Alternatively, use the ComfyUI wrapper available in the GitHub repository.
For optimal performance, consider using cloud GPUs such as NVIDIA A100 or RTX4090, especially for high-resolution video generation.
License
Ruyi-Mini-7B is released under the Apache 2.0 license, allowing for both personal and commercial use with minimal restrictions.