Categories: Tutorials

ComfyUI: Transform Images/Text into Lengthy Videos with Pyramid Flow

Generating longer video with maximum consistency is one of the challenging task. But now it can be possible with Pyramid Flow. A text to video open source model based on Stable Diffusion3 Medium, CogVideoX, Flux1.0, WebVid-10M, OpenVid-1M, Diffusion Forcing, GameNGen, Open-Sora Plan, and VideoLLAMA2.

The entire framework is optimized in an end-to-end manner and with a single unified Diffusion Transformer (DiT). The generation of high-quality 5-second (up to 10-second) videos at 768p resolution and 24 FPS within 20.7k A100 GPU training hours is supported by this method, as demonstrated by extensive experiments. Interested people can access the research paper for in-depth understanding.

Now it can be supported in ComfyUI. Lets dive to the installation section.

Table of Contents:

Installation:

1. Install ComfyUI on your machine.

2.Update it if already installed. Select “Update all” from ComfyUI Manager.

3. Move to “ComfyUI/custom_nodes” folder. Navigate to folder address bar. Open command prompt by typing “cmd“. Then into the command prompt clone the repository using following command:

git clone https://github.com/kijai/ComfyUI-PyramidFlowWrapper.git

4. All the respective models get auto downloaded from Pyramid’s Hugging face repository. The models are not optimized. As these are raw variants, you need to wait further for the GPU optimization. 

Workflow:

1. The workflow can be found inside your “ComfyUI/custom_nodes/ComfyUI-PyramidFlowWrapper/examples” folder.

There are two workflow you can choose from:

(a) Image to Video generation 

(b) Image to Video generation

2. There are two models for different video generation length:

(a) 384p checkpoint – supports up to 5 seconds with 24FPS video generation for running under 10GB VRAM

(b) 768p checkpoint -supports maximum 10 seconds  with 24FPS video generation for 10-12 GB VRAM.

Recommended settings:

(a) Text to Video generation 

num_inference_steps=[20, 20, 20]

video_num_inference_steps=[10, 10, 10]

height=768, width=1280

guidance_scale=9.0

video_guidance_scale=5.0

temp=16

(b) Image to Video generation

num_inference_steps=[10, 10, 10]

temp=16

video_guidance_scale=4.0

admage

Share
Published by
admage

Recent Posts

19 Captivating Selfie Ideas to Slay Your Feed

The most important problem many people face is how to take a selfie shot. Here…

6 hours ago

Amazon Introduces AI-Powered Video Ad Creation Tools

Businesses that sell on Amazon will have access to new generative AI tools allowing them…

5 days ago

Enhance Image Prompts with TIPO and DanTagGen

Are you searching for the best LLM that can optimize your prompts? Here it is.…

6 days ago

18 Captivating Fashion Photography Prompts for Camera Lenses

 Here, we tested all the prompts and sharing with you. All the prompts have distinct…

1 week ago

2024 TechCrunch Disrupt Side Events: Women in Tech, SignalFire, Llama Lounge, and More!

With TechCrunch Disrupt 2024 right around the corner, we’re thrilled to introduce the companies hosting…

2 weeks ago

Enhance Your Images: Elevate with SUPIR

Forget about those older Photoshop techniques where you need editing skills but don't get the…

2 weeks ago