Categories: Tutorials

ComfyUI: A Comprehensive Guide from Novice to Expert

Are you confused with other complicated Stable Diffusion WebUIs? No problem, try ComfyUI.

It is a Node-based Stable Diffusion Web user Interface that assists AI artists in generating incredible art. There are multiple nodes that you can create for your workflow with just a drag-and-drop technique.

The best part of ComfyUI is you don’t get the complicated multiple settings like we have in Automatic1111 which sometimes makes you confused.

Now a days, we noticed that ComfyUI get the first update whenever their is workflow related to new Diffusion research based project. Yes, at the first you will get confused with its multiple networking based nodes but will easy understanding after getting it used to.

ComfyUI Basic Nodes:

We will cover all the basic nodes that are widely used by the community for art generation in detail with examples. While moving to the basic first install ComfyUI into your machine.

1. CheckPointLoader: This is one of the common nodes. Adds this by right-clicking on canvas then

Add node> Loaders>Load checkpoints

Alternatively, you can do this using the search option by left double click on Canvas, search “checkpoint” and selecting “Load checkpoint” option provided.

This node gets an input in the form of a checkpoint. The model checkpoints get stored inside the “ComfyUI_windows_portablemodelscheckpoints” folder. So, whenever you want to use

any Stable Diffusion base models you should always store that into that directory.

It has three outputs i.e. “Clip“, “VAE” and “Model” which we discuss further.

Now, here sometimes you get a checkpoint with “deprecated” which means it’s no longer used by the ComfyUI.

2. Primitive Node: To create this node, right click on checkpoint loader> Click ckpt_name to input > Drag this out > Add Node>Utils>Primitive

Basically, this node name influences the value of the checkpoint you have currently selected.

The “Control Auto Generate” option is used to change the checkpoints multiple times automatically whenever you generate an art. However, it’s experimental and not used in normal use cases.

3. CheckPointLoader(Model output): Generally, it goes to the KSampler node. To connect to Ksmapler, just click and drag to the canvas area to open a dropdown menu where you can select “KSampler“.

4. CheckPointLoader(Clip): Clip(Contrastive Language-Image Pre-Training) gets connected to the CLIP Text Encode (Prompt) box where you have to put your positive /negative prompt.

Actually, Clip takes a positive/negative input and using the Tokenization technique breaks it into multiple tokens which are again converted into numbers(in machine learning it’s called Conditioning) because machines cannot understand words so it process in only numbers.

For example: If you input “a man” then this gets converted into “1 boy” This is what Tokenization is.

5. CheckPointLoader(VAE): VAE(Variational Auto Encoder) connect to the VAE Decode box.

Now, this is optional -You can also load individual nodes by double left-clicking on canvas for the Load VAE, Load Clip, and UNET Loader which actually combine to form “Load checkpoint“.

So, whenever you try to load your desired Stable Diffusion models in the “.safetensors” or “.ckpt” extension these need to be loaded on the “Load checkpoint” node. And these models usually comprise UNET, Clip, and VAE.

This means the load checkpoints are responsible for de-assembling all three functions (Unet, Clip, VAE) for the workflow pipeline.

Let’s say we have any image in pixel form (255,255,6) in RGB formation values. but, here the situation is machines can’t understand the images like the human eyes do.

So, the image pixel values get converted to binary numbers. Now in the machine learning world, when we train a model there is a difference between what we expect and what the result is. Actually here

the human can differentiate with the image gesture, quality, etc. but the model uses a percentage of Loss.

Usually, this is the way to train any machine learning model. However, if we talk about these image models this is not the optimal way to train it. So, StabilityAI has worked on a solution called Diffusion where an image that is in pixel form gets converted into Latent form or vice-versa using the VAE (Encoder/Decoder).

Here, one thing we need to mention is in the process of conversion from pixel to Latent or vice versa it loses the minute amount of image data. So, for example you are working on a big project and there are lots of nodes connected to each other you will lose a lot of information during these conversions.

So, you have to balance between what your requirement is and how many conversions you want to add for getting the optimal result.

6. Ksampler: The Ksampler has various inputs with a single output. Actually, this is responsible for the sampling process with a bunch of steps. With each step, the denoising of the image gets performed, and each time we get a better image from the earlier one.

Now, we will discuss a lot of options in Ksampler. All has been explained in a detailed simple manner:-

Seed: It’s normally the initial point where the random value is generated for any particular generated image. After the first generation, if you set its randomness to fixed, the model will generate the same style of image. This is one of the techniques to achieve consistency. Its default value is 0. The minimum value is 0 and the maximum value is in hexadecimal(0xffffffffffffffff) in actual its 1844674407370955. You must set your seed value between this specified range only.

Steps: It refers to the inference steps means the number of steps the diffusion mechanism needs to generate an intermediate latent image sample processed in latent space. In each step, the image is processed from noising and denoising attempts to get the perfect output. The default is 20, the minimum value is 0 and the maximum is 10000.

CFG: CFG(Classifier Free Guidance scale) It takes the float(decimal) values. Its default value is 8.0, minimum is 0.0 and 100.0 is the maximum value. It basically helps the diffusion models to follow and adds influence with the prompts. But, if you go further higher values you will get a totally different image perspective. So, usually closer to 7-8 is best to get the demanding results.

Sampler Name: It’s the type of algorithm for noising and denoising the latent image to achieve optimal performance. Many of the popular algorithms are DPM++, Euler, Euler A, etc.

Sampler Type	Relative Speed
Euler	Fast
Euler a	Fast
Heun	Medium
LMS	Fast
LMS Karras	Fast
DDIM	Fast
PLMS	Fast
DPM2	Medium
DPM2 a	Medium
DPM2 Karras	Medium
DPM2 a Karras	Medium
DPM++ 2S a	Medium
DPM++ 2S a Karras	Medium
DPM++ 2M	Fast
DPM++ 2M Karras	Fast
DPM++ SDE	Medium
DPM++ SDE Karras	Medium
DPM fast	Fast
DPM adaptive	Slow
UniPC	Fast

Scheduler: It’s the Ksampler’s Scheduler for scheduling techniques.

Positive conditioning: The positive prompt we used to generate AI Art.

Negative conditioning: It’s the negative prompt that we want don’t want in Image generation.

Noise Scheduler: It generally controls how much noise you have in the image it should be in each step.

Denoise factor: This is basically used when you are working in an “image-to-image” workflow. So, during conversion from latent to pixel and vice-versa it detects how much percentage we want to keep and change. If the denoising factor is 1 means 100% we don’t want to keep any image but when we give 0.5 means 50% we want to change our image with 50% of as it is.

7. Ksampler Advanced: This node is rarely used for depth workflow where we need many sampling steps with a diffusion latency mechanism. For example: with SDXL1.0 base models we use the refiners model as well.

Here, we need to use first Ksampler for loading SDXL1.0 base models and the later one with the refiner model.

Its to take in mind that every diffusion model behaves different with its sampler method, its steps, CFG scale, positive, negative conditionings etc.

So, you need to take it into consideration while downloading any diffusion models from Hugging Face or CivitAI. To gather more relevant information you should check their description section to leverage the maximum potential of the required model.

We explained the basic workflow and nodes that are normally used to generate images in the text-to-image process. You can look further and download extra workflows from our Hugging Face repository where we share multiple workflows with different nodes enabled.

8. Webcam Node: You can use Webcam node for testing any real time capturing projects like we did in Live Portrait real-time capturing and switching the frames.

To get this node, right click on clear canvas and select “Add node>image>Webcam Capture“.

ComfyUI Shortcuts:

Trust us, if you are working deep with your workflows, these shortcuts will work as wonders. At the initial stage, you will face some problems to recall but with regular usage, it will prove to be really a game changer. All the shortcuts have been listed below:

Sl. No	Shortcut Key	Description
1	Ctrl + Enter	Queue up current graph for generation
2	Ctrl + Shift + Enter	Queue up current graph as first for generation
3	Ctrl + Z/Ctrl + Y	Undo/Redo
4	Ctrl + S	Save workflow
5	Ctrl + O	Load workflow
6	Ctrl + A	Select all nodes
7	Alt + C	Collapse/uncollapse selected nodes
8	Ctrl + M	Mute/unmute selected nodes
9	Ctrl + B	Bypass selected nodes (acts like the node was removed from the graph and the wires reconnected through)
10	Delete/Backspace	Delete selected nodes
11	Ctrl + Backspace	Delete the current graph
12	Space	Move the canvas around when held and moving the cursor
13	Ctrl/Shift + Click	Add clicked node to selection
14	Ctrl + C/Ctrl + Shift + V	Copy and paste selected nodes (maintaining connections from outputs of unselected nodes to inputs of pasted nodes)
15	Shift + Drag	Move multiple selected nodes at the same time
16	Alt + +(Plus)	Canvas Zoom in
17	Alt + -(minus)	Canvas Zoom out
18	Ctrl + Shift + LMB + Vertical drag	Canvas Zoom in/out
19	Q	Toggle visibility of the queue
20	H	Toggle visibility of history
21	R	Refresh graph
22	Double-Click LMB	Open node quick search palette
23	Ctrl+D	Load default graph workflow

Use Command in place of Ctrl key, if you are a Mac user.

Important points:

1. While installing new nodes, if you get some error then always update all the nodes and ComfyUI at the first from the ComfyUI Manager section.

2. Whenever you load a workflow in ComfyUI and you get an error of “missing node” in red-colored, you should search for the specific node through ComfyUI manager and install it from the list.

3. Want to download any custom node but do not know how? There are basically two options you can choose from:

(a) First is from the search bar of ComfyUI Manager.

(b) Second is using the Hugging Face/Github repository.

The latter one is used when the functions are new and ComfyUI hasn’t updated their list.

4. Now, the ComfyUI workflow embeds its metadata inside any generated image. So, this will help you to work with the workflow in a team based collaborative environment or share you workflow in the community.

Conclusion:

After getting used to it will be more easy than any other Stable Diffusion WebUIs. The Workflow updates are more frequent as compared to other WebUIs.

admage