pyramid flow miniflux
rain1011Introduction
Pyramid Flow MiniFLUX is an autoregressive video generation model that utilizes flow matching for training efficiency. It is capable of generating high-quality 10-second videos at 768p resolution and 24 FPS, supporting both text-to-video and image-to-video generation.
Architecture
The model employs a MiniFLUX architecture, which improves human structure and motion stability compared to previous versions like SD3. It supports video generation at various resolutions, including 384p, 768p, and image generation at 1024p. The architecture allows for efficient training and inference, making use of sequential CPU offloading and VRAM-efficient features.
Training
The training process leverages open-source datasets, and the model is trained from scratch with a FLUX structure. The training code and model checkpoints are available for use and further experimentation. The latest improvements in model architecture have enhanced performance in terms of human structure accuracy and motion stability.
Guide: Running Locally
-
Installation:
- Set up the environment using Conda:
git clone https://github.com/jy0205/Pyramid-Flow cd Pyramid-Flow conda create -n pyramid python==3.8.10 conda activate pyramid pip install -r requirements.txt
- Download the model from Hugging Face:
from huggingface_hub import snapshot_download model_path = 'PATH' # Local directory to save the checkpoint snapshot_download("rain1011/pyramid-flow-miniflux", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
- Set up the environment using Conda:
-
Usage:
-
For inference, load the model and run text-to-video generation:
import torch from pyramid_dit import PyramidDiTForVideoGeneration torch.cuda.set_device(0) model_dtype, torch_dtype = 'bf16', torch.bfloat16 model = PyramidDiTForVideoGeneration('PATH', model_name="pyramid_flux", model_dtype, model_variant='diffusion_transformer_768p') model.enable_sequential_cpu_offload() prompt = "A movie trailer featuring the adventures of the 30 year old space man..." with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype): frames = model.generate(prompt=prompt, num_inference_steps=[20, 20, 20], height=768, width=1280, temp=16, guidance_scale=7.0, video_guidance_scale=5.0, output_type="pil", save_memory=True) export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
-
Cloud GPUs: For efficient video generation, using cloud GPUs such as AWS, Google Cloud, or Azure is recommended.
-
License
The Pyramid Flow MiniFLUX model is licensed under the Apache 2.0 License, allowing for wide usage and modification within the terms of the license.