Generating longer video with maximum consistency is one of the challenging task. But now it can be possible with Pyramid Flow. A text to video open source model based on Stable Diffusion3 Medium, CogVideoX, Flux1.0, WebVid-10M, OpenVid-1M, Diffusion Forcing, GameNGen, Open-Sora Plan, and VideoLLAMA2.
The entire framework is optimized in an end-to-end manner and with a single unified Diffusion Transformer (DiT). The generation of high-quality 5-second (up to 10-second) videos at 768p resolution and 24 FPS within 20.7k A100 GPU training hours is supported by this method, as demonstrated by extensive experiments. Interested people can access the research paper for in-depth understanding.
Now it can be supported in ComfyUI. Lets dive to the installation section.
Table of Contents:
1. Install ComfyUI on your machine.
2.Update it if already installed. Select “Update all” from ComfyUI Manager.
3. Move to “ComfyUI/custom_nodes” folder. Navigate to folder address bar. Open command prompt by typing “cmd“. Then into the command prompt clone the repository using following command:
git clone https://github.com/kijai/ComfyUI-PyramidFlowWrapper.git
4. All the respective models get auto downloaded from Pyramid’s Hugging face repository. The models are not optimized. As these are raw variants, you need to wait further for the GPU optimization.
1. The workflow can be found inside your “ComfyUI/custom_nodes/ComfyUI-PyramidFlowWrapper/examples” folder.
There are two workflow you can choose from:
(a) Image to Video generation
(b) Image to Video generation
2. There are two models for different video generation length:
(a) 384p checkpoint – supports up to 5 seconds with 24FPS video generation for running under 10GB VRAM
(b) 768p checkpoint -supports maximum 10 seconds with 24FPS video generation for 10-12 GB VRAM.
(a) Text to Video generation
num_inference_steps=[20, 20, 20]
video_num_inference_steps=[10, 10, 10]
height=768, width=1280
guidance_scale=9.0
video_guidance_scale=5.0
temp=16
(b) Image to Video generation
num_inference_steps=[10, 10, 10]
temp=16
video_guidance_scale=4.0
You must have been keeping up with the image/video generation models and probably noticed the…
Until now, high-performance image editing with generative models was locked behind closed APIs and proprietary…
Creating cinematic, detailed, and dynamic text to video content usually requires big models that are…
Creating realistic 3D assets is still a tough challenge, especially when balancing high-quality geometry with…
Audio-driven human animation faces three critical problems like maintaining character consistency in dynamic videos, achieving…
Often choosing between AI models like Flux and HiDream can be confusing. As they both…