Categories: Tutorials

Enhanced Diffusion Model 3: Stability Upgrade

Stable Diffusion 3 has been announced on February 22nd, 2024 but released on June 12th, 2024 that is the most improved text-to-image model reported by StabilityAI. This model is capable to generate images with text, understands the prompts better as compared to the earlier diffusion models. It works on the concept of diffusion transformer model and flow matching which can be found in their research paper.

StabilityAI team has confirmed that the training will be done under the responsible boundary in terms of testing, evaluation, and deployment so that it don’t get misused by the bad actors.

Credit- StabilityAI’s Hugging Face

Also one major changes is that they have collaborated with AMD and NVIDIA which will help the users to get more optimization to work with these models. Well, actually remember those early days when nobody knows how to tackle with those multiple installation errors.

The above chart showing different metrics of different image generation model in terms of Visual Aesthetics, Prompt following, and Typography.

Credit: StabilityAI’s repository

The model has not been released to use yet for the open community but, it can be accessible using API (for membership people). However, we have already made tutorial on how to download and install Stable Diffusion3 on PC, you should checkout that as well.

Now, lets dive into the testing and compare it with other popular image generation models and see what we have now.

Feature:

Improved performance in multiple subject identification
Enhanced Image Quality
Understands prompt better
Great in text creation
Parameters ranges from 800 Million to 8 Billion
More Safety from bad actors
Suitable for both commercial and non-commercial usage

Comparison with multiple image generation models

We have used different results to differentiate with Stable Diffusion3. Here, we are using image generated from MidjourneyV6, Dalle3, Stable Cascade and Stable Diffusion XL(SDXL).

We decided to take a bunch of images for Stable Diffusion 3 and prompts from the StabilityAI’s platform and do the comparisons.

Example1: Stable Diffusion 3 vs Dalle3 vs Midjourney v6

Here, we are doing comparison with Dalle3 and Midjourney v6.

Prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words “stable diffusion”

This art has been generated using Dalle3. The model has got confused with the prompts like there is no mention of wearing hat of pig and the text is not clearly added into image. The word has been misspelt by the model.

This is created using Midjourney V6. It seems like the model has tried to understand the prompt better but again here two robin birds which has not mentioned into prompts. The text is also not correct and it added that into the graffiti art.

The result generated by Stable diffusion 3 is much impressive as compared to other two image generation models. The important aspect is the text which is clearly understandable and intelligently added into the art. This means that this model is capable to handle multiple subjects with its description.

Example2: Stable Diffusion 3 vs Stable Cascade vs SDXL

This time we will compare with the popular Stable diffusion base models and see how much improvement the Stable Diffusion v3 has got yet.

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says “Stable Diffusion 3” made out of colorful energy

We have generated this art using Stable Cascade. The detailing is very good, prompt understanding is better as compared to older Stable Diffusion models. But, this model unable to detect the text and add into image. You can see model tried to add text at the corner but it clearly do not understanding and interpreting the prompts.

We used Stable Diffusion XL to generate the art with same prompt. It seems like Stable Diffusion XL(SDXL) is more creative because of the large data set it has been trained giving more versatility and colorful effects. But when it comes to adding text into image it struggles a lot. You can see this model not even tried to add a single text.

This is the art generated using Stable Diffusion 3 and you can observe the prompt understanding is very good, text is also much better than that of other diffusion models. The detailing color effect is quite impressive. But, here also you can see letter “A” in the word “STABLE” is something weird which is not been explained in official page.

But, obviously this model has been improved a lot which no one is doing that much perfection makes it more effective than the earlier one.

Conclusion:

When it comes to understanding prompt, text effects the new Stable Diffusion 3 proved to be the most powerful and improved model. StabilityAI is doing an amazing job in the field of open source which is really a game changer for individuals, engineers, and organizations.

admage