New release Stable Diffusion 3 shines in AI-generated body horror

Larger / An AI generated image created using Stable Diffusion 3 of a girl lying on the grass.

On Wednesday, Stability AI released the weights for Stable Diffusion 3 Medium, an AI image synthesis model that turns text requests into AI-generated images. However, its arrival has been mocked online because it generates images of people in a way that seems like a step back from other modern image synthesis models like Midjourney or DALL-E 3. As a result, it can easily induce wild anatomically incorrect visual abominations.

A Reddit thread titled, “Is this post supposed to be a joke? [SD3-2B],” details SD3 Medium’s spectacular failures in rendering people, especially human limbs like hands and feet. Another thread, titled, “Why is SD3 so bad at generating girls lying on grass?” shows similar issues , but for whole human bodies.

Hands have traditionally been a challenge for AI image generators due to the lack of good examples in early training datasets, but recently, some image synthesis models seemed to overcome the problem. In that sense, the SD3 seems to be a big step back for the image synthesis enthusiasts who flock to Reddit – especially compared to recent Stability releases like the SD XL Turbo in November.

“It wasn’t that long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!” one Reddit user wrote.

An AI generated image created using Stable Diffusion 3 Medium.
An AI-generated image created using Stable Diffusion 3 of a woman lying on the grass.
An AI-generated image created using Stable Diffusion 3 showing tangled hands.
An AI-generated image created using Stable Diffusion 3 of a woman lying on the grass.
An AI-generated image created using Stable Diffusion 3 showing tangled hands.
An AI-generated SD3 Medium image created by a Reddit user with the request “woman wearing a beach dress”.
An AI-generated SD3 Medium image created by a Reddit user with “photo of a person taking a nap in the living room.”

AI imaging enthusiasts are so far blaming Stable Diffusion 3’s anatomy failures on Stability’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring a model also gets rid of the human anatomy, so… here’s what happened,” one Reddit user wrote in the thread.

Basically, whenever a user searches for a concept that is not well represented in the AI model’s training dataset, the image synthesis model will interpolate its best interpretation of what the user is looking for. And sometimes that can be downright scary.

The release of Stable Diffusion 2.0 in 2022 suffered from similar problems in rendering humans well, and AI researchers quickly discovered that censoring adult content containing nudity could severely hamper an AI model’s ability to generate accurate anatomy human. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some lost capabilities by robustly filtering out NSFW content.

Another issue that can occur during pre-training of the model is that sometimes the NSFW filter that researchers use to remove adult images from the dataset is too selective, accidentally removing images that may not be offensive and deprive the model of descriptions of people in certain situations. “[SD3] it works fine as long as there are no people in the picture, I think their improved nsfw filter for training data filtering decided that anything humanoid is nsfw,” one Redditor wrote on the topic.

Using a free online demo of SD3 at Hugging Face, we ran the queries and saw results similar to those reported by others. For example, the query “a man showing his hands” returned an image of a man holding up two giant-sized backward hands, although each hand had at least five fingers.

An SD3 Medium example we created with the request “A woman lying on the beach”.
An nSD3 Medium example we created with the “A Man Showing Hands” prompt.

AI stability
An SD3 Medium example we created with the request “A woman showing hands”.

AI stability
An SD3 Medium example we created with the prompt “a muscular barbarian with a gun next to a CRT TV, cinematic, 8K, studio lighting”.
An SD3 Medium example we created with the request “A cat in a car holding a beer can”.

The stability problems are profound

Stable announced the Stable Diffusion 3 in February, and the company plans to make it available in different model sizes. Today’s release is for the “Medium” version, which is a model with 2 billion parameters. In addition to the weights available on Hugging Face, they are also available for experimentation through the company’s Stability Platform. Weights are available for free download and use only under a non-commercial license.

Shortly after its announcement in February, delays in the launch of the SD3 model weights inspired rumors that the launch was being held back due to technical problems or mismanagement. AI’s stability as a company took a turn for the worse recently with the resignation of its founder and CEO, Emad Mostaque, in March, followed by a series of layoffs. Shortly before that, three key engineers – Robin Rombach, Andreas Blattmann and Dominik Lorenz – left the company. And its troubles go back even further, with news of the company’s dire financial position continuing into 2023.

For some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagement—and an obvious sign of things falling apart. Although the company has not filed for bankruptcy, some users made dark jokes about the possibility after seeing the SD3 Medium:

“I think now they can go bankrupt in a safe and ethical way [sic] after all.”

The stability problems are profound

Leave a Comment Cancel reply