Allegro
rhymes-aiIntroduction
Allegro by RHYMES-AI is a text-to-video generation model designed to produce high-quality video content. It is open-source and available under the Apache 2.0 license.
Architecture
Allegro incorporates a 175M parameter VideoVAE and a 2.8B parameter VideoDiT model. It supports multiple precisions (FP32, BF16, FP16) and operates efficiently with 9.3 GB of GPU memory in BF16 mode using CPU offloading. The model can generate 6-second videos at 15 FPS with a resolution of 720x1280, which can be interpolated to 30 FPS using EMA-VFI.
Training
Allegro is trained to handle a variety of content, including dynamic scenes and close-ups of humans and animals. The model utilizes a large context length of 79.2K, equivalent to 88 frames, to produce detailed and versatile video outputs.
Guide: Running Locally
-
Install Requirements:
Ensure Python >= 3.10, PyTorch >= 2.4, and CUDA >= 12.4 are installed. Use Anaconda to create a new environment:conda create -n allegro python=3.10 -y
Install necessary packages:
pip install git+https://github.com/huggingface/diffusers.git torch==2.4.1 transformers==4.40.1 accelerate sentencepiece imageio imageio-ffmpeg beautifulsoup4
-
Run Inference:
Import necessary modules and load the model:import torch from diffusers import AutoencoderKLAllegro, AllegroPipeline from diffusers.utils import export_to_video vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32) pipe = AllegroPipeline.from_pretrained("rhymes-ai/Allegro", vae=vae, torch_dtype=torch.bfloat16) pipe.to("cuda") pipe.vae.enable_tiling() prompt = "A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water." video = pipe(prompt, guidance_scale=7.5, max_sequence_length=512, num_inference_steps=100).frames[0] export_to_video(video, "output.mp4", fps=15)
For reduced GPU memory usage, use
pipe.enable_sequential_cpu_offload()
, though this increases inference time. -
Interpolate Video:
Use EMA-VFI to interpolate videos to 30 FPS for enhanced quality. -
Faster Inference:
Explore options like Context Parallel and PAB for faster processing on GitHub.
Cloud GPUs: Consider using cloud GPU services like AWS, Google Cloud, or Azure for efficient processing.
License
This project is licensed under the Apache 2.0 License.