Categories: Tutorials

Enhancing AI with AnimateDiff and SDXL

This the just the workflow with Stable Diffusion XL(SDXL) models, but you can choose any SDXL fined tune models check points as per requirements to make your workflow better. To work with the workflow, you should use NVIDIA GPU with minimum 12GB (more is best).

Installation Process:

1. This workflow is only dependent on ComfyUI, so you need to install this WebUI into your machine.

2. Update your ComfyUI using ComfyUI Manager by selecting “Update All“. Next, you need to have  AnimateDiff installed. Using ComfyUI Manager search for “AnimateDiff Evolved” node, and make sure the author is Kosinkadink. Just click on “Install” button.

Download the “IP adapter batch unfold for SDXL” workflow from CivitAI article by Inner Reflections. Just directly drag and drop into ComfyUI. The is the basic ComfyUI workflow.

3. Now, if you are opening this workflow for the first time you will get a bunch of missing nodes error. 

4. Simply download all of them one by one using the ComfyUI Manager. To do this, open ComfyUI manager from “Manager” tab and into the manager and select “Install Missing custom nodes“.

Then, just install all the custom nodes one by one from the list by clicking on the “Install” button.

5. Now, just restart ComfyUI by hitting the “Restart” button.

6. Next is to download the model checkpoints necessary for this workflow. There are tones of them avaialble in CivitAI. For illustration, we are downloading ProtoVision XL

You can choose whatever model you want but make sure the model has been trained on Stable Diffusion XL(SDXL). 

As usual, save it inside “ComfyUImodelscheckpoints” folder. Its to take in mind that your output will be depend on the model you use as checkpoints.

7. Here, we are using SDXL fine-tuned model, so we will also need to use sdxl vibrational auto encoders as shown above “sdxl_vae.safetensors“. Download it from the Hugging Face repository. After downloading, place it inside “ComfyUImodelsvae” folder. 

8. Next you need to download IP Adapter Plus model (Version 2). Here, we need “ip-adapter-plus_sdxl_vit-h.safetensors” model for SDXL checkpoints listed under model name column as shown above. 

After download, just put it into “ComfyUImodelsipadapter” folder. Those users who have already upgraded their IP Adapter to V2(Plus), then its not required.

9. Download the image encoder for SDXL from Hugging Face repository. Rename the downloaded file as “image_encoder.safetensors” and save it inside “ComfyUImodelsclip_vision” folder.

10. Download the Text2Image control net model from the TencentArc’s Hugging face repository. You will have two version of same model version i.e. “diffusion_pytorch_model.fp16.safetensors” (for faster rendering with low quality)and “diffusion_pytorch_model.safetensors” (for higher quality with slow rendering speed). 

So, just download both of them and put them inside “ComfyUImodelscontrolnet” folder.

11. Last, you also need to download both the models “hsxl_temporal_layers.f16.safetensors” and “hsxl_temporal_layers.safetensors” from the Hotshotco’s Hugging face repository. Save them inside “ComfyUIcustom_nodesComfyUIAnimatedDiff-Evolvedmodels” folder.

12. Now, just restart your ComfyUI to take effect.

Workflow Explanation:

The workflow is so simple and we have breakdown all that you need to know in best possible way. Lets directly deep dive into it.
1. First you have to load your reference video clips which will be around 10-15 seconds. You should always use the shorter clips because its a time consume process and you can’t get the demanded results at the first go. Add the location path of your reference video in “Load video” node with removing the inverted comma.
2. Set your same dimension as your reference video. You can also scale down these value to fast the video rendering time.

3. Load your model into the “Load Checkpoint” node. After downloading the model checkpoint, if you don’t get that into the list the simply select “Refresh” from the ComfyUI Manager. Then load the SDXL VAE (Variational Auto Encoder) model.

4. Next is to load the IP Adpater plus model and the image encoder model.
5. Now, configure the IP adapter settings from the “Apply IP Adapter” node. You can play around with the weight from 0.20 to 0.80 and for noise its from 0.2 to 0.4 will definitely give you a marginal effect in your output. But, make sure not to go too far from the required settings.
6. Load the control net models. Here, we have two option-
(a) diffusion_pytorch_model.fp16.safetensors – Provides you faster video generation but results in low quality.
(b) diffusion_pytorch_model.safetensors – Takes higher GPU for rendering which will be slower with high quality output.
People having VRAM lesser than 12GB should choose the “fp16” version and other can select the latter one.
Then, set the ControlNet setting. Usual strength value with 1.0 you should use to influence the ControlNet effect.

7. Next from the “AnimateDiff Loader” Node, load the HotShotXL model from the dropdown list.
8. Now, comes the “KSampler” node where you need to play with the consistency, quality, and prompt influence to get better output. Setting with specified ranges are listed below you can with:
-Steps: 25-30
-Control After Generate: randomize
-CFG: Ranges from 3-8
-Sampler: Euler or DPMPP_3m_GPU
-Scheduler: Karras
-Start_Step: 5-13
Rest leave as it is.
9. Now put your positive and negative prompts into “CLIPTextEncodeSDXL” green and red colored node. There is two box into both the boxes. So, simply putting same prompt is the best way to work with.
Now, you can learn more about how to do prompting for Stable Diffusion models. Apart from that, if want to explore more prompt ideas then you can also try our Stable Diffusion Prompt Generator.
10. Next, we have the “Video Combine” node where you need to set the frame rate which should be same as the original video.
11. At last, there is a Upscaling group which comprises of various nodes like- Upscale Image, VAE Encoder, KSampler Advanced, VAE Decoder, Save Image and Video Combine Nodes.
Here, set all the setting as it is.
Into the “Video Combine” node just change the frame rate as same as your original video. You can rename your generated video, formats(Ex- mp4 etc.).
12. Finally, you can select the “Queue prompt” button to start your video rendering. After video generation, you can access the generated output from “ComfyUIoutput” folder.
The new version of ComfyUI adds metadata into every image or video you generate. So, if you want to load the same workflow you can do that by simply drag and drop into your ComfyUI canvas also helps to work in collaborative team environment.
Its to take in mind that you need to play around with the setting for getting the satisfied results with multiple rendering.
admage

Share
Published by
admage

Recent Posts

InstantIR: Restore Your Images

Figuring out the model that can fix your low quality pictures? Now, restoring your low…

4 days ago

Omnigen: Next-Gen Image Generation & Editing Tool

Traditional diffusion models uses various mechanisms for image modification like ControlNet, IP-Adapter, Inpainting, Face detection,…

1 week ago

Create Engaging Videos with Mochi1

Mochi 1, an open-source text-to-video diffusion model has been released by Genmo.  Trained with 10…

3 weeks ago

Local Installation of Stable Diffusion 3.5

So, it's finally here. Stable Diffusion 3.5 has been released by StabilityAI on October 22nd,…

4 weeks ago

Video Depth Mapper: Efficient 3D Depth Mapping Solution

Due to the extreme diversity of video content like motion, camera panning, and length, the…

1 month ago

Top 30 Negative Prompts for Reliable Diffusion Models

 You can use these negative prompts while generating images using Stable Diffusion models.1. Clear BrandingPrompt:text,…

1 month ago