Categories: Tutorials

ComfyUI: Transform Images/Text into Lengthy Videos with Pyramid Flow

Generating longer video with maximum consistency is one of the challenging task. But now it can be possible with Pyramid Flow. A text to video open source model based on Stable Diffusion3 Medium, CogVideoX, Flux1.0, WebVid-10M, OpenVid-1M, Diffusion Forcing, GameNGen, Open-Sora Plan, and VideoLLAMA2.

The entire framework is optimized in an end-to-end manner and with a single unified Diffusion Transformer (DiT). The generation of high-quality 5-second (up to 10-second) videos at 768p resolution and 24 FPS within 20.7k A100 GPU training hours is supported by this method, as demonstrated by extensive experiments. Interested people can access the research paper for in-depth understanding.

Now it can be supported in ComfyUI. Lets dive to the installation section.

Table of Contents:

Installation:

1. Install ComfyUI on your machine.

2.Update it if already installed. Select “Update all” from ComfyUI Manager.

3. Move to “ComfyUI/custom_nodes” folder. Navigate to folder address bar. Open command prompt by typing “cmd“. Then into the command prompt clone the repository using following command:

git clone https://github.com/kijai/ComfyUI-PyramidFlowWrapper.git

4. All the respective models get auto downloaded from Pyramid’s Hugging face repository. The models are not optimized. As these are raw variants, you need to wait further for the GPU optimization. 

Workflow:

1. The workflow can be found inside your “ComfyUI/custom_nodes/ComfyUI-PyramidFlowWrapper/examples” folder.

There are two workflow you can choose from:

(a) Image to Video generation 

(b) Image to Video generation

2. There are two models for different video generation length:

(a) 384p checkpoint – supports up to 5 seconds with 24FPS video generation for running under 10GB VRAM

(b) 768p checkpoint -supports maximum 10 seconds  with 24FPS video generation for 10-12 GB VRAM.

Recommended settings:

(a) Text to Video generation 

num_inference_steps=[20, 20, 20]

video_num_inference_steps=[10, 10, 10]

height=768, width=1280

guidance_scale=9.0

video_guidance_scale=5.0

temp=16

(b) Image to Video generation

num_inference_steps=[10, 10, 10]

temp=16

video_guidance_scale=4.0

admage

Share
Published by
admage

Recent Posts

Quantized Models: GGUF, NF4, FP8, FP16 (Ultimate Reference)

 You must have been keeping up with the image/video generation models and probably noticed the…

23 hours ago

Flux.1 Kontext Dev: Top Image Editing & Styling Software

Until now, high-performance image editing with generative models was locked behind closed APIs and proprietary…

1 week ago

Efficient VideoGen with Low VRAM: Wan2.1 FusionX 14B

Creating cinematic, detailed, and dynamic text to video content usually requires big models that are…

2 weeks ago

Transform Images to 3D with Hunyuan 3D v2.1 & Blender Integration

 Creating realistic 3D assets is still a tough challenge, especially when balancing high-quality geometry with…

2 weeks ago

AI Avatar Generator for Multiple Characters by HunyuanVideo

 Audio-driven human animation faces three critical problems like maintaining character consistency in dynamic videos, achieving…

3 weeks ago

Comparing Image Generation: Flux vs Hidream

Often choosing between AI models like Flux and HiDream can be confusing. As they both…

1 month ago