Categories: Tutorials

Utilizing ControlNet Network: A Detailed Guide

When it comes to image art control net becomes one of the game-changing models with different functions. So, we are here to help you learn how to deep dive into the details with easy steps. Basically, Control Net provides you an effective way to generate an image of a person in any pose, or style art without doing any training from scratch.

Follow this tutorial, if you haven’t downloaded the Control Net for Automatic1111 yet then you should download the extension first to understand the full workflow.

Functionalities of Control Net:

There are varieties of options ControlNet which will confuse you a little bit. So, for easy explanation we have also shown each option how to use and what types of results and use cases in image generation you get.

1. Enable: The first check box is the “Enable check box” that is used to enable the control net to work and take effect.

2. Low VRAM: Low VRAM is used when you have a lower VRAM than the recommended one.

3. Pixel Perfect: This option grabs the actual dimensions of the image you have added to work on.

4. Allow Preview: This helps to preview the image for generated preprocessor output and helps to clarify what the image structure will be.

Control Type:

It’s to be mentioned that all the options in the “Control Type” section are related to preprocessor output, which means for using any options from the control type make sure to check the “Allow preview” option first to take effect.

1. All: Here, all the options below use a preprocessor which helps to identify how the picture will look like if we generate an image using a particular prompt.

So, just after selecting the option we need to feed the the prompt and click on “Generate” to generate a new image like you are uploading by drag and drop.

2. Canny: Detects the edges of the added image and shows it in sketchy image outlines-like format.

Now, to use this function first select the “Allow Preview” checkbox, then select “Canny” After that click the flash star-like button to run the preprocessor.

The preprocessor’s respective model gets automatically selected. Now, if you want to generate a new image with the same poses, then you need to select all the options input the image prompt, and click the “Generate” button.

3. Depth: Depth is used to generate the depth of the image. Similarly, if you want to select something like Depth then do a similar procedure as “Canny” which means just select the “Allow Preview” checkbox, then select “Depth“. After that click the flash star-like button to run the preprocessor.

4. Normal: this option generates a 3d like image and helps generate a geometry-like image. This detects the curves and mounts of the image in Z scale.

5. OpenPose: The name in its “pose” clearly depicts that it is used to generate an image with different poses. This is widely used for generating a moving picture and videos.

This doesn’t affect the face color, hairstyle or color, or body structure but only changes the angle of the body structure. This option works like a charm for photoshoots with multiple angles.

It has multiple options for targeting specific parts. Now, again if you want to generate a new image with the same poses, then you need to select all the options input the image prompt, and click the “Generate” button.

Open Pose editor: It’s an extension that can be downloaded from Hugging Face. Here, the open pose editor is a function that helps to change the body pose of any image. We can change the pose by just clicking over the joins of the generated skeleton of the image.

Alternatively, you can also use platforms like CivitAI for downloading multiple body poses.

6. MLSD: It tries to grab the edges of straight lines. Since houses and building structures have straight lines, this option is mostly used in the architectural field.

As you can see the edges of round faces are undetectable by MLSD.

7. LineART: It creates art in hand like a sketch and helps to capture rounded edges.

8. Scribble: This works the same as the Lineart but the difference is it generates more broader lines as compared to the Lineart.

9. Soft Edge: Again, this option works the same as the Lineart, but gives more softer edges of the uploaded image.

10. Seg: This is called segmentation means it segments the image objects into the same color types.

Like in the above image the background and cat have been differentiated with different colors.

11. Shuffle: Shuffle helps to generate new images by mixing the effect by shuffling and randomizing the pixels.

12. Tile: It changes the pixel details of the image. This helps in generating images with the “image2image” option for video because the video’s frame doesn’t need much detailing otherwise you will get a weird flickering effect.

Adding to it, if you want to reduce the detailing then use “Downsampling Rate“. Here, the maximum value is 8, and the minimum value is 1.

13. Inpaint: Inpaint is widely used for changing the specific part of the image like face swap, removing objects, etc. To use it, just create a mask on the dropped image. Well, it can also be used to increase the resolution of an image which is known as outpainting.

14. IPix2Pix: this option converts the image into multiple different styles. Like in the example, we are using a house image in spring. By inputting a prompt and using this option we converted it into a winter look.

15. Reference: It just copies and moves the style of the dropped image to a new generating image.

Multiple Control Nets:

If you want to work and load multiple controlnets like “ControlNet 0“, “ControlNet1“, “ControlNet2“, then you need to go to the setting tab presented on the top. Then click the Control net option on the left panel and move the slider for the number you want to get the control net.

Then click on “Apply Setting” and “Reload UI” to take effect.

Multiple control net helps to generate a new image using the art style of multiple images. Like in example we want to generate an image of Iron Man with a city building in the background. So, we dropped the city image on controlnet0 and the Iron Man image on controlnet1 and it generated a new image.

Batch: This option is used to generate a large number of images with a single try but setting a larger number can take much time to generate multiple images. 3-4 sizes are used for generating multiple images. This option can also be used to generate a person with multiple poses.

Control Weight: The higher the value is the stronger the Control net effect will be on the target image and vice versa. The default value is 1.

Starting control Step: helps to set in which moment you want to set the control to start and take the “controlnet” effect. Its default value is 0.

Ending step: helps to set in which moment you want to set the control to lose the “controlnet” effect. Its default value is 1. This means Starting step to the Ending step the range is from 0 to 1. Here, 0.5 means 50% and 1 means 100%.

Preprocessor Pixel: It basically shows the current pixel of the loaded image. Like if we load an image with a 512 by 512 image, it shows a “512” value into it.

Canny Low threshold: This is for how low the outline detection of the image you want.

Canny High threshold: This is for how high the outline detection of the image you want. Increasing and decreasing the slider will detect more and fewer edges of the image respectively.

Basically, these options are more effective if you try to do the selection like the pen tool in Photoshop(if you are familiar with it).

Now in the Control Mode section, you will see radio buttons:

Balanced/My prompt is more important/Control net: It is used to give priority between the given prompt and ControlNet.

In the Resize mode option you will get :

Just resize/Crop and Resize/Resize and Fill: This option is necessary if you are using different dimensions of height and width. To set similar width and height values choose these otherwise, you get bad results.

Conclusion:

ControlNet is a powerful model for Stable Diffusion which you can install and run on any WebUI like Automatic1111 or ComfyUI etc. Using this we can generate images with multiple passes, and generate images by combining frames of different image poses.

Not only this, it also helps in generating multiple frames for AI video generation without using any image training.

admage