Allegro T I2 V
rhymes-aiIntroduction
Allegro-TI2V is an open-source model developed by RhymesAI for generating videos from text-image prompts. It is designed to create high-quality video content and is accessible for community use under the Apache 2.0 license.
Architecture
Allegro-TI2V consists of two main components:
- VideoVAE: A 175M parameter model for video generation.
- VideoDiT: A 2.8B parameter model supporting multiple precisions (FP32, BF16, FP16) and efficient memory use with CPU offloading.
Key features include the generation of 6-second videos at 15 FPS and 720x1280 resolution, with the option to interpolate to 30 FPS.
Training
The model allows for versatile content creation, handling various input types such as generating video from a single frame or between specified frames. It supports high-quality output and efficient processing with a focus on small model size and memory usage.
Guide: Running Locally
-
Download Code: Clone the Allegro GitHub repository.
git clone https://github.com/rhymes-ai/Allegro
-
Install Requirements: Use Anaconda for a new environment.
conda create -n allegro python=3.10 conda activate allegro pip install -r requirements.txt
-
Download Model Weights: Obtain from Hugging Face.
# Download weights from https://huggingface.co/rhymes-ai/Allegro-TI2V
-
Run Inference:
python single_inference_ti2v.py \ --user_prompt 'The car drives along the road.' \ --first_frame your/path/to/first_frame_image.png \ --vae your/path/to/vae \ --dit your/path/to/transformer \ --text_encoder your/path/to/text_encoder \ --tokenizer your/path/to/tokenizer \ --guidance_scale 8 \ --num_sampling_steps 100 \ --seed 1427329220
- Adjust parameters as needed for your specific use case.
-
Interpolate Video (Optional): Use EMA-VFI for 30 FPS interpolation.
# Refer to https://github.com/MCG-NJU/EMAVFI for interpolation instructions.
Cloud GPUs such as NVIDIA H100 are recommended for efficient processing.
License
This project is available under the Apache 2.0 License.