Categories: Tutorials

ComfyUI: Transform Images/Text into Lengthy Videos with Pyramid Flow

Generating longer video with maximum consistency is one of the challenging task. But now it can be possible with Pyramid Flow. A text to video open source model based on Stable Diffusion3 Medium, CogVideoX, Flux1.0, WebVid-10M, OpenVid-1M, Diffusion Forcing, GameNGen, Open-Sora Plan, and VideoLLAMA2.

The entire framework is optimized in an end-to-end manner and with a single unified Diffusion Transformer (DiT). The generation of high-quality 5-second (up to 10-second) videos at 768p resolution and 24 FPS within 20.7k A100 GPU training hours is supported by this method, as demonstrated by extensive experiments. Interested people can access the research paper for in-depth understanding.

Now it can be supported in ComfyUI. Lets dive to the installation section.

Table of Contents:

Installation:

1. Install ComfyUI on your machine.

2.Update it if already installed. Select “Update all” from ComfyUI Manager.

3. Move to “ComfyUI/custom_nodes” folder. Navigate to folder address bar. Open command prompt by typing “cmd“. Then into the command prompt clone the repository using following command:

git clone https://github.com/kijai/ComfyUI-PyramidFlowWrapper.git

4. All the respective models get auto downloaded from Pyramid’s Hugging face repository. The models are not optimized. As these are raw variants, you need to wait further for the GPU optimization. 

Workflow:

1. The workflow can be found inside your “ComfyUI/custom_nodes/ComfyUI-PyramidFlowWrapper/examples” folder.

There are two workflow you can choose from:

(a) Image to Video generation 

(b) Image to Video generation

2. There are two models for different video generation length:

(a) 384p checkpoint – supports up to 5 seconds with 24FPS video generation for running under 10GB VRAM

(b) 768p checkpoint -supports maximum 10 seconds  with 24FPS video generation for 10-12 GB VRAM.

Recommended settings:

(a) Text to Video generation 

num_inference_steps=[20, 20, 20]

video_num_inference_steps=[10, 10, 10]

height=768, width=1280

guidance_scale=9.0

video_guidance_scale=5.0

temp=16

(b) Image to Video generation

num_inference_steps=[10, 10, 10]

temp=16

video_guidance_scale=4.0

admage

Share
Published by
admage

Recent Posts

InstantIR: Restore Your Images

Figuring out the model that can fix your low quality pictures? Now, restoring your low…

4 days ago

Omnigen: Next-Gen Image Generation & Editing Tool

Traditional diffusion models uses various mechanisms for image modification like ControlNet, IP-Adapter, Inpainting, Face detection,…

1 week ago

Create Engaging Videos with Mochi1

Mochi 1, an open-source text-to-video diffusion model has been released by Genmo.  Trained with 10…

3 weeks ago

Local Installation of Stable Diffusion 3.5

So, it's finally here. Stable Diffusion 3.5 has been released by StabilityAI on October 22nd,…

4 weeks ago

Video Depth Mapper: Efficient 3D Depth Mapping Solution

Due to the extreme diversity of video content like motion, camera panning, and length, the…

1 month ago

Top 30 Negative Prompts for Reliable Diffusion Models

 You can use these negative prompts while generating images using Stable Diffusion models.1. Clear BrandingPrompt:text,…

1 month ago