Local Video Installation and Generation (Native/GGUFs)

install wan video model in local pc using comfy

Again the new diffusion based video generation model released by AlibabaCloud. Wan2.1 an open-source suite of video foundation models licensed under Apache 2.0. It delivers state-of-the-art performance while remaining accessible on consumer hardware. You can read more information from their research paper.

WanX 2.1 an OPEN-SOURCE VideoGen model 🙂🧐

by AlibabaCloud

Coming soon … pic.twitter.com/j9HktHzHQI

— Stable Diffusion Tutorials (@SD_Tutorial) February 21, 2025

It outperforms existing open-source models and rivals commercial solutions in the market. The TextToVideo model generates 5-second 480P video on RTX 4090 in ~4 minutes using 8.19 VRAM without optimization from both Chinese and English textual prompts.

Model	Resolution	Features
T2V-14B	480P & 720P	Best overall quality
I2V-14B-720P	720P	Higher resolution image-to-video
I2V-14B-480P	480P	Standard resolution image-to-video
T2V-1.3B	480P	Lightweight for consumer hardware

Table of Contents:

Installation

No matter whatsoever workflow you want. Just install ComfyUI if you are new to it. Old users need to update ComfyUI from the Manager section by selecting “Update ComfyUI“.

Type A: Native Support

1. Download models (TextToVideo or ImageToVideo) from Hugging Face and save it into your “ComfyUI/models/diffusion_models” directory.

2. Now, download Text encoders(umt5_xxl_fp8_e4m3fn_scaled.safetensors) and save it into “ComfyUI/models/text_encoders” folder.

3. Download clip models and put it into “ComfyUI/models/clip_vision” folder.

4. At last, you also need download VAE models, put it into your “ComfyUI/models/vae” folder.

Workflow

1. Get the required workflow from our Hugging face repository.

2. Drag and drop into ComfyUI.

(a) Load Wan model(TxtToVideo or ImgToVideo) into UNet loader node.

(b) Load text encoder into clip node

(d) Put Positive/negative prompts

(e) Set KSampler settings

(f) Click “Queue” option to start generation.

Shift Test

CFG Test

Type B: (Quantized By Kijai)

1. Clone the Wan Wrapper repository into your “custom_nodes” folder by typing following command into command prompt.

~~git cone https://github.com/kijai/ComfyUI-WanVideoWrapper.git~~

2. Move inside “ComfyUI_windows_portable” folder and open command prompt. Install the required dependencies by typing commands.

For normal ComfyUI users:

~~pip install -r requirements.txt~~

For portable ComfyUI users:

~~python_embededpython.exe -m pip install -r ComfyUIcustom_nodesComfyUI-WanVideoWrapperrequirements.txt~~

3. Download models (TextToVideo or ImageToVideo) from Hugging Face and put it into “ComfyUI/models/diffusion_models” folder.

Here, there are two options (BF16 and FP8)to choose from with different video (480p and 720p) generation. Select the one that is relevant for your machine and use cases. BF 16 is for higher VRAM(more than 12GB) and FP8 for lower VRAM (12GB or lesser)users.

4. Download the relevant Text encoders and save it into “ComfyUI/models/text_encoders” folder. Select Bf16 or Fp32 variant.

5. Then you also need download the relevant VAE model and place it into your “ComfyUI/models/vae” directory. Select Bf16 or Fp32 variant.

6. Restart ComfyUI.

Workflow

1. You can get the workflow inside your “ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/example_workflows” folder.

2. Drag and drop into ComfyUI.

We tested this with ImageToVideo on RTX 3080 10GB VRAM with sage attention enabled and the generation time was around 467 seconds.

Note: This workflow uses Triton and Sageattention in the background that will increase the inference time but its optional. You can enable and disable if you do not need.

If you want to install these then make sure you have the Vcredist, CUDA12.x, Visual studio, installed on your system. Their are many confusions in setting up the Triton into windows machines. You can get the detailed understanding from Triton-windows github repository.

Install Windows Trition .whl file for your python version. To check python version run “python –version” (without quotes) in command prompt. We have python 3.10 version installed. For other python version checkout Windows Trition release section.

Type C: GGUF variant (By City96)

1. Install GGUF custom nodes from the Manager section by selecting “Custom nodes manager” option. Now, search for “ComfyUI-GGUF” (by Author City 96) and hit install.

Users who already used the Flux GGUF/ Stable Diffusion 3.5 GGUF variant/ HunyuanVideo GGUF earlier, only need to update this custom node from the Manager by selecting “Update” option.

2. Download any of the relevant models from Hugging Face repository:-

(a) ImageToVideo-14B-720P-gguf

Download rest of the models from Comfy’s Hugging Face repository.

Save Img2Vid model into “ComfyUI/models/unet” folder. Clip vision into “ComfyUI/models/clip_vision“, Text encoder into “ComfyUI/models/text_encoders” and VAE into “ComfyUI/models/vae” folder.

(b) ImageToVideo-14B-480P-gguf

Download the rest of the models from Comfy’s Hugging Face repository.

Download rest of the models files (Text encoder, VAE etc)from Kijai Wan repository explained above.

Save Txt2Vid model into “ComfyUI/models/unet” folder. Clip vision into “ComfyUI/models/clip_vision“, Text encoder into “ComfyUI/models/text_encoders” and VAE into “ComfyUI/models/vae” folder.

Here, you have various model types from Q3bit(very light weight, faster with lower quality generation) to Q8bit(very heavy weight, slower with high precision). Choose as per your system VRAM and use case.

3. Restart ComfyUI to take effect.

Workflow

1. Download the same workflow of Comfyui’s repository from Type B’s workflow section.

2. All will be same here. Just replace the “Load Diffusion Model” node with “UNet Loader (GGUF)” node.

Installation

Type A: Native Support

Workflow

Type B: (Quantized By Kijai)

Workflow

Type C: GGUF variant (By City96)

Workflow

Related Posts

MimicMotion: Enhancing Video Frames with ComfyUI

Installing and Running Fooocus on Google Colab and PC

Enhanced Resolution: 8x Upscaling

Empowering Human Connection Through SkyReels Videos