cogvideox1.5 5b prompt camera motion LLM Model

Introduction

CogVideoX1.5-5B-Prompt-Camera-Motion is a LoRa (Low-Rank Adapter) model that extends CogVideoX by enabling control of camera movement in videos. It supports six directions: left, right, up, down, zoom_in, and zoom_out, facilitating smooth camera motions for video creation.

Architecture

The model is based on CogVideoX and utilizes LoRa to adapt its capabilities for camera motion control in video generation tasks. It integrates with the Diffusers library, leveraging its video-to-video capabilities.

Training

The LoRa model was trained specifically to control camera movements in six directions. The training process involved using prompts formatted to indicate the desired camera movement, such as "Camera moves to the {}..." or "{} camera turn...".

Guide: Running Locally

To run the model locally, follow these steps:

Install Dependencies: Ensure you have PyTorch and the Diffusers library installed.
Load the Model: Use the CogVideoXImageToVideoPipeline from the Diffusers library.
Load Weights: Load the LoRa weights specifically for camera motion.
Configure Settings: Set up the pipeline with appropriate adapters and enable settings for sequential CPU offload, slicing, and tiling.
Prepare Inputs: Load and resize the input image.
Generate Video: Use the pipeline to generate video frames based on the given prompt and settings.

import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = CogVideoXImageToVideoPipeline.from_pretrained(
    "THUDM/CogVideoX1.5-5B-I2V", torch_dtype=torch.bfloat16
)

pipe.load_lora_weights("NimVideo/cogvideox1.5-5b-prompt-camera-motion", adapter_name="cogvideox-lora")
pipe.set_adapters(["cogvideox-lora"], [1.0])

pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

height = 768 
width = 1360
image = load_image("resources/car.jpg").resize((width, height))
prompt = "Camera is moving to the left. A red sports car driving on a winding road."

video_generate = pipe(
    image=image,
    prompt=prompt,
    height=height, 
    width=width, 
    num_inference_steps=50,  
    num_frames=81,  
    guidance_scale=6.0,
    generator=torch.Generator().manual_seed(42), 
).frames[0]

export_to_video(video_generate, output_path, fps=8)

For optimal performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The model is licensed under the Apache 2.0 License, allowing for both personal and commercial use, modification, and distribution.

More Related APIs