pyramid flow miniflux LLM Model

Introduction

Pyramid Flow MiniFLUX is an autoregressive video generation model that utilizes flow matching for training efficiency. It is capable of generating high-quality 10-second videos at 768p resolution and 24 FPS, supporting both text-to-video and image-to-video generation.

Architecture

The model employs a MiniFLUX architecture, which improves human structure and motion stability compared to previous versions like SD3. It supports video generation at various resolutions, including 384p, 768p, and image generation at 1024p. The architecture allows for efficient training and inference, making use of sequential CPU offloading and VRAM-efficient features.

Training

The training process leverages open-source datasets, and the model is trained from scratch with a FLUX structure. The training code and model checkpoints are available for use and further experimentation. The latest improvements in model architecture have enhanced performance in terms of human structure accuracy and motion stability.

Guide: Running Locally

Installation:

Set up the environment using Conda:

git clone https://github.com/jy0205/Pyramid-Flow
cd Pyramid-Flow
conda create -n pyramid python==3.8.10
conda activate pyramid
pip install -r requirements.txt

Download the model from Hugging Face:

from huggingface_hub import snapshot_download

model_path = 'PATH'  # Local directory to save the checkpoint
snapshot_download("rain1011/pyramid-flow-miniflux", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')

Usage:

For inference, load the model and run text-to-video generation:

import torch
from pyramid_dit import PyramidDiTForVideoGeneration

torch.cuda.set_device(0)
model_dtype, torch_dtype = 'bf16', torch.bfloat16
model = PyramidDiTForVideoGeneration('PATH', model_name="pyramid_flux", model_dtype, model_variant='diffusion_transformer_768p')
model.enable_sequential_cpu_offload()

prompt = "A movie trailer featuring the adventures of the 30 year old space man..."
with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
    frames = model.generate(prompt=prompt, num_inference_steps=[20, 20, 20], height=768, width=1280, temp=16, guidance_scale=7.0, video_guidance_scale=5.0, output_type="pil", save_memory=True)
export_to_video(frames, "./text_to_video_sample.mp4", fps=24)

Cloud GPUs: For efficient video generation, using cloud GPUs such as AWS, Google Cloud, or Azure is recommended.

License

The Pyramid Flow MiniFLUX model is licensed under the Apache 2.0 License, allowing for wide usage and modification within the terms of the license.