Stable Diffusion 3.5 is the newest update from Stability.AI. While SD 1.5 has 983 million parameters and SDXL has 3.5 billion, SD 3.5 Large has a whopping 8 billion parameters for enhanced performance. SD 3.5 Large boasts a market-leading prompt adherence rate, which means it's more likely to create what you ask for, even if you ask for something complex and imaginative.
Stable Diffusion 3.5 Large CivitAI Link
Stable Diffusion 3.5 Medium CivitAI Link
Stable Diffusion 3.5 Prompt Guide from Stability.AI
Why do renders take so long with SD 3.5?
Stable Diffusion 3.5 is a large and powerful model, but it is not fast. Whereas most models render artwork in just a few seconds, SD 3.5 can take a minute or more just in rendering time.
What is the best use for Stable Diffusion 3.5?
SD 3.5 is great for complicated and imaginative creations that other AI models are unable to render. Check out the section on Complex Prompts below for an example of a prompt SD 3.5 Large could create, but the best SDXL models could not.
How is SD 3.5 Large different from SD 3.5 Medium?
SD 3.5 Medium is a trimmed version of SD 3.5, but it seems to do better with portraits and people (in some cases) than SD 3.5 Large. Most of these examples use SD 3.5 Large unless otherwise noted.
SD 1.5 models worked better with keyword prompting. Any extraneous text was likely to confuse the older AI models.
SDXL allowed for greater prompt understanding. Eventually, some SDXL models, like RealVisXL v5, moved toward natural language prompts. Flux Schnell uses natural language prompting as well. Stable Diffusion 3.5 works with both.
So how do these prompting techniques differ? Here's an example:
![]() Keyword Prompt: beautiful woman wearing a full-length dress made of pink petals, halter neckline, avant-garde styling, high-fashion editorial photograph, studio lighting with rim lights, shot against soft dappled shadows |
![]() Natural Language Prompt: Create a beautiful woman wearing a full-length dress made of pink petals. The dress should have a halter neckline and avant-garde styling. The image should be a high-fashion editorial photograph, shot with studio lighting using rim lights. The background should include soft dappled shadows. Negative Prompt: deformed, disfigured, ugly, blurry, worst quality, low quality, bad quality, mutation |
The seed and other rendering information are the same for both images. The way the prompt is worded is the only change.
Natural language prompts are worded as if you were talking to a human, whereas keyword prompts use keywords and short phrases separated by commas.
Both techniques work well with Stable Diffusion 3.5.
When you are trying to describe a scene, it helps to move in a logical way throughout the scene, especially when the elements are placed relative to each other. In this image, we first describe the woman, then the tree, then the petals as they fall from the tree and float upon the sea, then the background, and finally, the artistic style of the image.
![]() Prompt: A woman stands at the edge of a stormy ocean cliff, beneath a cherry blossom tree. The tree drops countless pink petals into the sea below. The petals float on the waves and sink into the sea. In the background, dark gray clouds gather. Oil painting, Expressionism Negative Prompt: deformed, distorted, disfigured, ugly, blurry, lowres, mutation, worst quality, low quality, bad quality |
Consider this artwork, inspired by the song “A Whiter Shade of Pale” by Procol Harum. Again, the prompt moves logically from the subject (the woman) to the setting, to the sea waves, to the objects in the waves, and finally, to the style and mood of the image.
Prompt: Beautiful pale woman, close up, drowning in an old tavern that has been flooded by sea water. The waves are crashing against the walls. The neon bar light reads "And so it was" in red, its light reflect in the water. Some playing cards and a martini glass are floating in the waves. The scene is mostly white, bright, and faded, the mood is ominous, bleak and hauntingly beautiful. Negative Prompt: - ![]() |
|
![]() |
![]() |
The 3 artworks above use the same prompt with different seeds. In some cases, more of the tavern interior is shown. Sometimes, there are flaws in the text. But SD 3.5 Large has consistently rendered a woman with wet hair and pale wet skin, a realistic neon sign, plus playing cards and a glass on the waves, all inside a tavern. Stable Diffusion 3.5 Large is able to create imaginative ideas that were out of reach with earlier AI art models. |
|
![]() |
![]() |
Let's take one more look at this watery tavern. In my first attempt to create the artwork, I used the prompt below on the left. I've often referenced artists Beksiński and Gammell in SDXL to create a grim mood, but I didn't see the effect I was looking for in this case. I was also using some of the usual SDXL negatives.
When in doubt, prune it out. When there are elements in your prompt that don't seem to be making an impact in your artwork, you can either rework those elements to be more prominent * or remove them altogether.
* How can you make parts of your prompt more prominent? Move them to the beginning or end. Items at the beginning and ending of a prompt carry more significance than those in the middle. Also, in SDXL and SD1.5, you have the option of prompt weighting. But prompt weighting is currently not available in SD 3.5.
In this case, the artists' names were already at the end of the prompt. Artist references don't work in SD 3.5 quite the same as in SDXL. More on that later, but in this case, I suspected that removing the artist references would not negatively impact the style, but it might fix the errors.
Also, removing the negative prompts helped. The woman's eyes are weird in the first image, but they look fine in the second image. Most notably, the text is more accurate in the second image, and the background looks more like a tavern and less like a mysterious jumble of imagery.
Sometimes, it helps to prune out parts of the prompt that don't seem to be helping the image.
Prompt: beautiful pale woman, close up, drowning in an old tavern that has been flooded by sea water. The waves are crashing against the walls. The neon bar light reads "And So It Was" reflect in the water. Some playing cards and a martini glass are floating in the waves. The scene is mostly white and faded, the mood is ominous, haunting, and bleak. Style of Zdzisław Beksiński and Stephen Gammell Negative Prompt: deformed, distorted, disfigured, ugly, blurry, worst quality, low quality, bad quality ![]() |
Prompt: beautiful pale woman, close up, drowning in an old tavern that has been flooded by sea water. The waves are crashing against the walls. The neon bar light reads "And So It Was" reflect in the water. Some playing cards and a martini glass are floating in the waves. The scene is mostly white and faded, the mood is ominous, haunting, and bleak. Negative Prompt: - ![]() |
Stable Diffusion 3.5 reacts differently to artist's names than SDXL. In this example, Mark Rothko and Clyfford Still are referenced. Both are Abstract Expressionist artists whose works involved color fields and abstract shapes. Biomorphic means something that suggests the forms of living things. So, with this prompt, I am trying to create an abstract cat-like shape in specific colors with an artist's eye for texture and form.
SD 3.5 Large creates a beautiful artwork with stunning textures. It is a step above RealVisXL v5, a top SDXL model, as well as Flux Schnell (shown center bottom). SD 3.5 Medium does a nice job with the textures but the outcome has none of the depth of 3.5 Large.
In row 2, the first image shows the same prompt EXCEPT I've removed the artist names. The differences are subtle, but significant. In the second image, I kept the artist names but removed the negative prompts. Each image is captioned in the chart below, and all images share the same seed.
Prompt: Biomorphic abstract painting that resembles a cat, in warm and cool purple tones with hints of orange and aqua, with paint textures, Mark Rothko, Clyfford Still Negative Prompt: deformed, distorted, disfigured, ugly, blurry, glitch, lowres, mutation, worst quality, low quality, bad quality |
||
![]() |
![]() |
![]() |
![]() SD 3.5 Large |
![]() SD 3.5 Large |
![]() SD 3.5 Medium |
![]() SD 3.5 Large |
![]() Flux Schnell |
![]() SD 3.5 Medium |
Here's another example. In this, adding a single negative prompt, “ugly”, has changed the artwork, maybe for the better. It could be argued that Flux Schnell version represents Joan Miró's style more accurately.
Prompt: 3D abstract cat sculpture made of glazed ceramic, in the style of Joan Miró | ||
![]() |
![]() |
![]() |
It's easier to see the artists' influence with these cats because there's not much else going on with the prompt. We'll look at some more complicated prompts with Part 2.
When it comes to negative prompts with Stable Diffusion 3.5, less is better.
In the first row below, I wanted to create a watercolor image, so I added the names of two watercolor artists. I then tried the same render but removed the artists' names. The differences are subtle.
In the second row, I used the “Haunting” style in MonAI, which added this part of the prompt “(haunting, moody, bleak), surreal, Sidney Nolan, (Karel Thole:1.2), (Kati Horna), Virgil Finlay, (Paul Wunderlich:1.2)”. I adjusted the Guidance Scale back down to 4.4 and ran it. Then, I removed the artists and ran it again. Please note, while prompt weighting is there in the prompt, SD 3.5 will simply ignore it. In this case, the overall mood of the image seems darker and more complex in the version with the artists, while the other version looks more photorealistic.
Ultimately, while you can control the artistic style of an image to some degree with Stable Diffusion 3.5, it's often more helpful to describe the look you want in words rather than referencing artists by name.
Prompt: beautiful jellyfish floating in a downtown city street at dusk, painted in a style that combines ink wash and watercolor, in the style of artist Dorothy Lathrop and Ivan Bilibin ![]() |
Prompt: beautiful jellyfish floating in a downtown city street at dusk, painted in a style that combines ink wash and watercolor ![]() |
Prompt: beautiful jellyfish floating in a downtown city street at dusk, (haunting, moody, bleak), surreal, Sidney Nolan, (Karel Thole:1.2), (Kati Horna), Virgil Finlay, (Paul Wunderlich:1.2) ![]() |
Prompt: beautiful jellyfish floating in a downtown city street at dusk, (haunting, moody, bleak), surreal, ![]() |
Prompt: beautiful jellyfish floating in a downtown city street at dusk, simple illustration in a vector graphic style with the texture of artist Rufino Tamayo ![]() |
Prompt: beautiful jellyfish floating in a downtown city street at dusk, simple illustration in a comic book style and the style of artist Richard Corben ![]() |
In the forest scene below, adding the phrase In the Vienna Secession style, an Art Nouveau drawing to the end of the prompt created a beautiful illustration style. However, adding the same phrase to the more complicated prompt on the right resulted in an image that still looked like a photo.
Prompt: a fox in a forest, next to a squirrel, with ferns, leafy plants, and mushrooms. In the Vienna Secession style, an Art Nouveau drawing ![]() |
Prompt: beautiful redhead with short punky hair, smiling warmly, close up, wearing a cute sweater and jeans, sitting at a table at an outdoor bar in the city. There is a neon sign behind her that reads "Notifications" in red. In the Vienna Secession style, an Art Nouveau drawing ![]() |
So I tried putting the artistic style IN FRONT OF the prompt instead of at the end, and that worked. However, the word “Notifications” is misspelled, so I dropped the “Vienna Secession style” from the prompt, and that improved the spelling.
Prompt: Vienna Secession style, an Art Nouveau drawing of a beautiful redhead with short punky hair, smiling warmly, close up, wearing a cute sweater and jeans, sitting at a table at an outdoor bar in the city. There is a neon sign behind her that reads "Notifications" in red. ![]() |
Prompt: Art Nouveau drawing of a beautiful redhead with short punky hair, smiling warmly, close up, wearing a cute sweater and jeans, sitting at a table at an outdoor bar in the city. There is a neon sign behind her that reads "Notifications" in red. ![]() |
The style information is more likely to be “heard” by the AI if you put it in front of the prompt.
However, there are some instances where a style will simply not work with a subject. In this example, the comic book style that references Richard Corben, a famous illustrator, works beautifully with a caped feline superhero on a downtown roof, but it doesn't work at all on a fish in a coral reef, even when the style information is placed at the beginning of the prompt. By comparison, the Art Nouveau style creates a beautiful illustrated artwork using the same seed.
Some artistic styles are difficult to apply to certain subjects. This is usually because there are not as many examples that combine that style and subject. There are many comic books with caped heroes in urban settings, but not as many with coral reefs.
Prompt: A cat-human superhero wearing a full-length red-orange gradient cape and a yellow-gold suit, side view from above, close portrait, standing at the top of a downtown skyscraper, looking down at the city below, at dawn. by Richard Corben. Simple illustration in a comic book style. ![]() |
Prompt: cute blue tang fish (Dory) in a coral reef. by Richard Corben. Simple illustration in a comic book style. ![]() |
Prompt: Simple illustration in a comic book style, artist Richard Corben, cute blue tang fish (Dory) in a coral reef ![]() |
Prompt: cute blue tang fish (Dory) in a coral reef, Vienna Secession style Art Nouveau drawing ![]() |
Most artworks in this tutorial use a guidance scale setting of 4.4. This was a popular setting on Civit, and it appears to be a good default setting for a lot of artwork styles using Stable Diffusion 3.5.
However, the best guidance scale setting depends on the subject matter and art style. In this example, you can see a lot of differences between one guidance scale setting and the next. Note the changes to her fingers, her sleeves, her crown, and the castle behind her.
At the lower settings, the details are less well-defined, but the scene makes more sense. Her fingers are not clearly visible, but the pose looks more realistic. The towers of the castle behind her rise to a point. Her necklace is hanging at the center of its chain.
At the higher settings, the details are more clear, but less logical. The towers of the castle have multiple points. At 5.9 and 6.4, her necklace is lopsided. The fleur d'lis of the crown develop nicely as the guidance scale goes up, but at 7.1, the most prominent fleur d'lis is missing.
Prompt: A beautiful queen wearing an ornate silver crown with diamonds and fleur d'lis. The queen is crying with her head in her hand, side view, tears running down her cheeks. She is lit by a candle with a castle wall in the background. A hint of Gothic in an Art Nouveau drawing. Negative Prompt: deformed, disfigured, ugly, photography Steps: 25 |
||
Guidance Scale: 4.4 ![]() |
Guidance Scale: 4.7 ![]() |
Guidance Scale: 5.5 ![]() |
Guidance Scale: 5.9 ![]() |
Guidance Scale: 6.4 ![]() |
Guidance Scale: 7.1 ![]() |
It appears that there is a yellow area in the center left of the original seed image. At 4.4, the yellow appears like a warm light reflected on the castle wall, possibly the first light of dawn. By 5.5, the yellow area starts taking shape, but the shape is malformed behind the candle. At 5.9 and 6.4, the yellow appears to be a reflection of the candle's light, although the candle is too far and too dim to cast light on that castle wall. By 7.1, the candle itself has grown shorter to become the source of the yellow light.
This is how the AI works with seed images. The seed image is static, but that static includes areas that tend to develop into certain colors. This is also why the queen's hand is either covered by a white sleeve or has a white tint.
You can get great results using SD 3.5 Large with 20 to 25 steps. In this example, the differences between 15 and 30 steps are subtle. Her hands aren't well formed at 15 steps, but by 20, the image looks pretty much the same as it does at 30. Remember, fewer steps means faster render time.
Prompt: A woman gardener with dirt and flowers in her hair, holding a cardboard box full of spring flowers, wearing denim overalls, suburban wall background. Piercing blue eyes and an exhausted expression. Hilarious caricature in a 3D cartoon style. Negative Prompt: deformed, ugly, photo, photography |
|
15 Steps ![]() |
20 Steps ![]() |
25 Steps ![]() |
30 Steps ![]() |
![]() Prompt: Eye level shot of a rustic, hand-crafted wooden table covered with roasted coffee beans, a burlap sack spilling beans in the foreground. Hot streaming cup of espresso sits beside a sack of coffee beans, with whisps of steam curling into the air. Negative Prompt: - |
![]() Prompt: cute skunk in a wildflower field, whimsical style with small details and organic patterns, stylized, hand-drawn, vibrant, warm colors, #handcrafted #handmade Negative Prompt: blurry, ugly |
![]() Prompt: A close-up of the Sorceress' enigmatic eyes, swirling with intricate celestial patterns of galaxies, stars, and nebulas, their glowing light casting delicate highlights on her brows and lashes. The chiaroscuro style heightens the drama, with intense shadows framing her face and a soft celestial glow illuminating her features. The color palette of rich blues, purples, and shimmering golds contrasts beautifully with the deep shadows, creating a visually arresting image of infinite mystery and allure. Negative Prompt: - |
![]() Prompt: A tiny snail approaching a house built into a mushroom. Art Nouveau drawing in a Vienna Secession style Negative Prompt: deformed, ugly, photo, photorealism |
![]() Prompt: A highly detailed and vibrant digital artwork combining elements of realism and fantasy. The composition features a portrait of a snake-human hybrid, with the subject’s pale, smooth skin subtly textured to resemble scales. The snake-human’s head is intricately detailed, facing the viewer with large, expressive eyes that shimmer with vivid green irises. The human-snake’s scales appear smooth and glossy, reflecting light to add depth and texture. Layers of iridescent and metallic textures contrast against darker, moody tones of black, grey, and hints of gold. The artwork merges hyper-realism with fantasy, rich in vivid colors and intricate details that emphasize the magical and extraterrestrial qualities of the scene. Negative Prompt: - |
![]() Prompt: Dynamic action shot of a cyberpunk flying car traveling over the grimy neon-lit streets of a dystopian futuristic city below |
![]() Prompt: The words "Me + You" written into the sand on a beach. A cute seagull is looking into the eyes of a cute sea turtle. Waves and water on the shore in the background, Hilarious caricature in a 3D cartoon style. Negative Prompt: deformed, ugly, photo, photography |
![]() Prompt: beautiful woman with freckles, pink lipstick, and red hair, rising from a deep milk bath. She is surrounded by pink flowers floating on the milk, cinematic photo Negative Prompt: deformed, ugly |
![]() Prompt: Portrait, collage of black block letters on a white background, letters of different sizes draw a dream portrait, light and shadow, low angle, fog, monochrome, kinetic art, Victor Vasarely, hyper detailed Negative Prompt: deformed, disfigured, ugly, photography |
![]() Prompt: Cybernetic woman running down a dark alley with steam vents and glowing graffiti covering the walls, cinematic film still Negative Prompt: - |
![]() Prompt: sketch of a red rose with a long stem and leaves. The drawing is sketched with firm strokes using charcoal and colored pencils in a simple and minimalist style on light sepia drawing paper. Negative Prompt: deformed, ugly, photography |