On Wednesday, Stability AI released the weights for Stable Diffusion 3 Medium, an AI image synthesis model that turns text requests into AI-generated images. However, its arrival has been mocked online because it generates images of people in a way that seems like a step back from other modern image synthesis models like Midjourney or DALL-E 3. As a result, it can easily induce wild anatomically incorrect visual abominations.
A Reddit thread titled, “Is this post supposed to be a joke? [SD3-2B],” details SD3 Medium’s spectacular failures in rendering people, especially human limbs like hands and feet. Another thread, titled, “Why is SD3 so bad at generating girls lying on grass?” shows similar issues , but for whole human bodies.
Hands have traditionally been a challenge for AI image generators due to the lack of good examples in early training datasets, but recently, some image synthesis models seemed to overcome the problem. In that sense, the SD3 seems to be a big step back for the image synthesis enthusiasts who flock to Reddit – especially compared to recent Stability releases like the SD XL Turbo in November.
“It wasn’t that long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!” one Reddit user wrote.
AI imaging enthusiasts are so far blaming Stable Diffusion 3’s anatomy failures on Stability’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring a model also gets rid of the human anatomy, so… here’s what happened,” one Reddit user wrote in the thread.
Basically, whenever a user searches for a concept that is not well represented in the AI model’s training dataset, the image synthesis model will interpolate its best interpretation of what the user is looking for. And sometimes that can be downright scary.
The release of Stable Diffusion 2.0 in 2022 suffered from similar problems in rendering humans well, and AI researchers quickly discovered that censoring adult content containing nudity could severely hamper an AI model’s ability to generate accurate anatomy human. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some lost capabilities by robustly filtering out NSFW content.
Another issue that can occur during pre-training of the model is that sometimes the NSFW filter that researchers use to remove adult images from the dataset is too selective, accidentally removing images that may not be offensive and deprive the model of descriptions of people in certain situations. “[SD3] it works fine as long as there are no people in the picture, I think their improved nsfw filter for training data filtering decided that anything humanoid is nsfw,” one Redditor wrote on the topic.
Using a free online demo of SD3 at Hugging Face, we ran the queries and saw results similar to those reported by others. For example, the query “a man showing his hands” returned an image of a man holding up two giant-sized backward hands, although each hand had at least five fingers.
The stability problems are profound
Stable announced the Stable Diffusion 3 in February, and the company plans to make it available in different model sizes. Today’s release is for the “Medium” version, which is a model with 2 billion parameters. In addition to the weights available on Hugging Face, they are also available for experimentation through the company’s Stability Platform. Weights are available for free download and use only under a non-commercial license.
Shortly after its announcement in February, delays in the launch of the SD3 model weights inspired rumors that the launch was being held back due to technical problems or mismanagement. AI’s stability as a company took a turn for the worse recently with the resignation of its founder and CEO, Emad Mostaque, in March, followed by a series of layoffs. Shortly before that, three key engineers – Robin Rombach, Andreas Blattmann and Dominik Lorenz – left the company. And its troubles go back even further, with news of the company’s dire financial position continuing into 2023.
For some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagement—and an obvious sign of things falling apart. Although the company has not filed for bankruptcy, some users made dark jokes about the possibility after seeing the SD3 Medium:
“I think now they can go bankrupt in a safe and ethical way [sic] after all.”