pyramid flow sd3 LLM Model

Introduction

Pyramid Flow is an efficient autoregressive video generation model based on Flow Matching, capable of producing high-quality 10-second videos at 768p resolution and 24 FPS. It also supports image-to-video generation and is trained solely on open-source datasets.

Architecture

The model employs the miniFLUX structure to address issues with human structure depiction, transitioning from the previous SD3-based framework. It supports varying resolutions for video and image generation, specifically 1024p for images and 384p or 768p for videos.

Training

The training code and model checkpoints are available for public use. The model is trained on datasets like WebVid-10M and OpenVid-1M, using the FLUX structure to improve performance. The training process has been optimized to generate detailed and dynamic video sequences.

Guide: Running Locally

Installation:

Clone the repository and set up the environment:

git clone https://github.com/jy0205/Pyramid-Flow
cd Pyramid-Flow
conda create -n pyramid python=3.8.10
conda activate pyramid
pip install -r requirements.txt

Model Download:

Use Hugging Face to download the desired model variant:

from huggingface_hub import snapshot_download
snapshot_download("rain1011/pyramid-flow-sd3", local_dir='PATH', local_dir_use_symlinks=False)

Usage:
- Load the model and generate videos using the provided code snippets, adjusting parameters like guidance_scale and video_guidance_scale to refine video quality and motion.
Suggested Cloud GPUs:
- Consider using cloud services like AWS or GCP with GPU instances to efficiently handle video generation tasks, especially for high-resolution outputs.

License

The Pyramid Flow model is distributed under the stabilityai-ai-community license. Please review the LICENSE.md file for more detailed information on usage and redistribution rights.