By Eric Picard
This article is a discussion of the current state of generative image AI software, and how simple prompts can lead to beautiful images, but trying to get what you’re looking for exactly, and controlling for that is quite involved, and requires some tricks to really manage the AI’s interpretations.
Note: For this article, I’m using MidJourney Version 5.1. You will see in the text prompts some commands that are unique to MidJourney, the most obvious being –v 5.1 which is the command telling the MidJourney AI which version to use. 5.1 is the newest model supported by MidJourney as of this day (Friday, May 19th, 2023). There are lots of other generative AI tools out there, but personally I’m finding that MidJourney gives me the results I prefer over the others. Also, some of the prompts have been edited for clarity and consistency, but not in a way that affects the output. None of the images have been edited.
The idea I have for this project is to create an idealized image of a couple in their forties, standing in a New England field, arm in arm, with a colonial house in the background. The vision I have for this is reminiscent of an Irish Spring or Old Spice commercial from the 1970s, but set today. So I’m going to be including things in the prompt that aim the AI at recreating some of the sense of this from back in that time. Here is what I would write as a prompt as a starting point if I were just doing this for myself:
“a photograph of a couple posing for the camera holding each other, they both wear thick white cable knit sweaters and are in their mid-forties. She is a beautiful, tall and willowy woman, he is clean shaven, rugged and handsome, they stand in a new england field with long grass and brightly colored wildflowers, in autumn, they look towards the camera, there are fieldstone walls in the background and a row of trees at the back of the field showing fall colors in red and yellow, and a small colonial house visible in the distance. There is depth in the photograph, with the house being positioned off in the distance, with rolling hills in between the couple and the house.”
I have an idea that this would provide me with decent results just off of a prompt, but I know well enough that I likely would need some image prompts, as well as text to get to my ideal end result. Including some visual instructions to the AI that would make it likely to include all the elements I care about. Also, I know roughly what I’m looking for in the way the characters look. I’d love the man to look like James Purefoy in his role in Fisherman’s Friends, but more clean shaven, maybe just stubble. And the woman in my mind looks like Rachel McAdams dressed casually, or maybe Kate Mara with her aged makeup from the new show she’s in “the Class of 09”.
But rather than taking all my accumulated knowledge of how to write prompts, I’m going to start out simple, because you’ll see that the first set of images that MidJourney are quite beautiful, but don’t meet my initial vision.
Starting with a simple prompt gets me this group of four images:
1:20 PM
a man and a woman stand in a field with long grass and wildflowers –v 5.1

Well – the field looks kind of like what I wanted, but none of the other background components are there, and the two figures are nothing like what I want. I also know that trying to tune the whole image with two figures gets complicated, so I’m going to retrench and just start with a single figure, and I’ll use the man first. I’ll start tuning my prompts until I start getting closer to what I want.
[1:21 PM]
a man stands in a field with long grass and wildflowers –v 5.1 –

This is a good starting point, let me start tuning the man to get closer to my vision:
[1:23 PM]
a photograph of a tall man with broad shoulders wearing jeans and a sweater stands in a new england field with long grass and wildflowers in autumn, he looks toward the camera –v 5.1

Okay – not quite what I’m looking for, but we’re getting there. Let me tune the man and the setting he’s in a bit:
[1:25 PM]
a photograph of a handsome and rugged tall man with broad shoulders, wearing jeans and a thick cable knit white sweater stands in a new england field with long grass and wildflowers of many colors in autumn, he looks towards the camera, there are stone walls in the background and a row of trees and a small house visible in the distance. –v 5.1 –

That’s better but he’s not quite right, and I want more depth in the image.
— Yesterday at 1:28 PM
a photograph of a handsome and rugged tall man with broad shoulders who looks like a James Purefoy, wearing jeans and a thick cable knit white sweater stands in a new england field with long grass and wildflowers of many vibrant colors in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –v 5.1

Okay, not really better. Yes, one of those images looks like James Purefoy, but I lost all the other elements that I love. Sometimes focusing too much of the AI’s attention on one element loses you the rest. Also, the aspect ratio of the image isn’t very cinematic, so I’m going to set the aspect ratio going forward to a 2:1 ratio using the –ar command:
[1:30 PM]
a photograph of a handsome and rugged tall man with broad shoulders, wearing jeans and a thick cable knit white sweater stands in a new england field with long grass and wildflowers of many vibrant colors in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1 –

Okay – this is getting much closer, but he’s too young. I want to tune his age:
[1:31 PM]
a photograph of a handsome and rugged tall man in his forties with broad shoulders, wearing jeans and a thick white cable knit sweater stands in a new england field with long grass and wildflowers of many vibrant colors in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1 –

Okay, a bit more age specific, but I don’t have my white cable knit sweater, and I see that the AI is ignoring my request for jeans. Maybe if I remove jeans, the sweater will resolve?
[1:34 PM]
a photograph of a handsome and rugged tall man in a thick white cable knit sweater in his forties with broad shoulders, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1 –

That didn’t really help. Let’s try specifying the color of the house and see if that helps. Also, these guys are all getting beards, so let’s clean that up:
[1:35 PM]
a photograph of a handsome and rugged clean shaven tall man in a thick white cable knit sweater in his forties with broad shoulders, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small white colonial house visible in the distance. –ar 2:1 –v 5.1 –

Alrighty, the clean shaven prompt helped, but the white house request isn’t helping. And I still don’t have a white cable knit sweater. Let’s see what happens when I include the “Stylize” command. This lets the AI be more creative in how it executes. The Stylize command also has a range of settings from 1 – 1000. I’ll put it on the highest setting to see what the difference is:
1:37 PM
a photograph of a handsome and rugged clean shaven tall man in a thick white cable knit sweater in his forties with broad shoulders, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –stylize 1000 –v 5.1 –

Well, you can see that the AI was a bit more creative in the layout, but it didn’t get us what I want. Now it’s time to get really aggressive with the background. I’m going to put in some images to illustrate aspects of what I like. I’ll use these three images for all the rest of the renderings, but here they are for your review:

This really helps the AI see that I want depth in the image, also what I’m looking for with a line of trees in the distance. It doesn’t really help with the wildflowers though, nor with the stone walls. So I’ll add in more images:


Let’s see how that helped, keeping Stylize turned on:
1:45 PM
https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a handsome and rugged clean shaven tall man in a thick white cable knit sweater in his forties with broad shoulders, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –stylize 1000 –v 5.1 –

That was much better as far as putting depth and the wildflowers and the walls in – but let’s see if it works better with Stylize off:
[1:46 PM]
https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a handsome and rugged clean shaven tall man in a thick white cable knit sweater in his forties with broad shoulders, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1 –

Okay, so we’re getting somewhere. I like the background quite a lot, although the figure is getting buried a bit. Let me try this same set of images with a woman instead.
[1:48 PM]
https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a beautiful and willowy woman in a thick white cable knit sweater in her forties, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1 –

That’s good – I’m seeing a consistent treatment of the setting, but a lot of play with the figure. Let’s see what Stylize does.
[1:50 PM]
https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a beautiful and willowy woman in a thick white cable knit sweater in her forties, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –stylize 1000 –v 5.1 –

That didn’t seem to make a lot of difference. So now I’m going to start playing with the figure and see if I can tune it towards what I want. Let’s start with a general figure that’s the right age and wearing the sweater I want:
— Yesterday at 1:52 PM
photographic portrait of a Handsome Man, cleanshaven and rugged looking, in a thick white cable knit sweater –v 5.1

Well, that’s not James Purefoy, but I think the 3rd image looks pretty good. I’ll render that out and look at it larger:
1:55 PM
photographic portrait of a Handsome Man, cleanshaven and rugged looking, in a thick white cable knit sweater –v 5.1 – Image #3

Now that I have an image that gets more of the visual information to the AI, I’ll drop that into the first slot of my prompt:
2:00 PM
https://s.mj.run/ePXrbTX226c https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a handsome and rugged clean shaven tall man in a thick white cable knit sweater in his forties with broad shoulders, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1

And this is much better. Really close to what I’m looking for. Now I just need to find a woman to help tune the image. Here’s my first try on that.
[2:02 PM]
beautiful and willowy woman in a thick white cable knit sweater in her forties –v 5.1

I find that MidJourney struggles a bit with age, particularly women. I’d say these women look older than mid-forties. More like fifties or even early sixties. But for now, it should at least set the tone.
2:03 PM
beautiful and willowy woman in a thick white cable knit sweater in her forties –v 5.1 – Image #3

[2:05 PM]
https://s.mj.run/DDmJPXMNGS0 https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a beautiful and willowy woman in a thick white cable knit sweater in her forties, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1 –

Great – this gets us pretty close to what I’m looking for, but let me try both the male and female version of these with Stylize maximized and see if it helps…
[2:09 PM]
https://s.mj.run/ePXrbTX226c https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a handsome and rugged clean shaven tall man in a thick white cable knit sweater in his forties with broad shoulders, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –stylize 1000 –v 5.1 –

[2:10 PM]
https://s.mj.run/DDmJPXMNGS0 https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a beautiful and willowy woman in a thick white cable knit sweater in her forties, stands in a new england field with long grass and brightly colored wildflowers, in autumn, he looks towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –stylize 1000 –v 5.1 –

I think that actually helped a bit, but not a ton. Now we’re ready to tune our image and text prompts to get closer to our final version. I’m going to create a new prompt just to get the couple generated in a way that will help the AI. As you’ll see in a moment, even with an explicit prompt, MidJourney really wants to blend two input images into a single image, especially with people. So I know before I do this that I’m going to end up with a few different blends, but usually at least one image in the quad will follow the instructions. For this I’ve taken my two individual images of the man and woman, and dropped them in just to see what happens:
2:14 PM
https://s.mj.run/ePXrbTX226c https://s.mj.run/DDmJPXMNGS0 photographic portrait of a couple in their forties arm in arm, he is clean shaven rugged and handsome in a thick white cable knit sweater, she is tall and willowy wearing a fall colored sweater –v 5.1 –

Okay – that was three blended single figures, and one that matched more what I was looking for. So I’ll take that one and see what happens.
2:15 PM
https://s.mj.run/ePXrbTX226c https://s.mj.run/DDmJPXMNGS0 photographic portrait of a couple in their forties arm in arm, he is clean shaven rugged and handsome in a thick white cable knit sweater, she is tall and willowy in her late thirties wearing a fall colored sweater –v 5.1 – Image #2

Just so that I can tune this, let’s try just a text prompt and see what we get with no reference images, because this isn’t 100% what I’m looking for:
[2:16 PM]
photographic portrait of a couple in their forties arm in arm, he is clean shaven rugged and handsome in a thick white cable knit sweater, she is tall and willowy wearing a fall colored sweater –v 5.1 –

Remarkably, I think not using reference images gave me a couple I could use here that is much closer to what I was looking for – with the third image.
2:17 PM
photographic portrait of a couple in their forties arm in arm, he is clean shaven rugged and handsome in a thick white cable knit sweater, she is tall and willowy in her late thirties wearing a fall colored sweater –v 5.1 – Image #3

[2:20 PM]
https://s.mj.run/F9kR5x8XeRM https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a couple in their forties wearing thick white cable knit sweaters, posing for the camera holding each other, she is a beautiful, tall and willowy woman , he is tall clean shaven, rugged and handsome, they stand in a new england field with long grass and brightly colored wildflowers, in autumn, they look towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –v 5.1 –

Now we’re cooking with fire. I really like the first image, but there’s no house in it. Let’s see if Stylize helps.
[2:26 PM]
https://s.mj.run/F9kR5x8XeRM https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a couple in their forties wearing thick white cable knit sweaters, posing for the camera holding each other, she is a beautiful, tall and willowy woman , he is tall clean shaven, rugged and handsome, they stand in a new england field with long grass and brightly colored wildflowers, in autumn, they look towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –stylize 1000 –v 5.1 –

Okay – that second image in the upper right is just about perfect. I’d like it if he was wearing a sweater with a bit more texture and thickness. And the AI has ignored my original request to have her in a cable knit sweater too. But I like it even better. So after all that, we have our final image:
2:27 PM
https://s.mj.run/F9kR5x8XeRM https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a couple in their forties wearing thick white cable knit sweaters, posing for the camera holding each other, she is a beautiful, tall and willowy woman , he is tall clean shaven, rugged and handsome, they stand in a new england field with long grass and brightly colored wildflowers, in autumn, they look towards the camera, there are stone walls in the background and a row of trees showing fall colors in red and yellow, and a small colonial house visible in the distance. –ar 2:1 –stylize 1000 –v 5.1 – Image #2

As you can see, this whole process from start to finish took just over an hour, and required me to think a lot about what I wanted, and to be willing to iterate quickly. I also could have shortened that experience by starting with my very first prompt and iterating from there. If I go back to that very first prompt, you’ll see that it gives us something good – but doesn’t really get me to what I had envisioned.
“a photograph of a couple posing for the camera holding each other, they both wear thick white cable knit sweaters and are in their mid-forties. She is a beautiful, tall and willowy woman, he is clean shaven, rugged and handsome, they stand in a new england field with long grass and brightly colored wildflowers, in autumn, they look towards the camera, there are fieldstone walls in the background and a row of trees at the back of the field showing fall colors in red and yellow, and a small colonial house visible in the distance. There is depth in the photograph, with the house being positioned off in the distance, with rolling hills in between the couple and the house. –ar 2:1 –v 5.1”

These are all very nice images, although we see MidJourney having its age issue again, and not listening well to the clean shaven input. We’ve also lost a lot of what we gained from using the reference images, so I’ll drop those back in:
https://s.mj.run/F9kR5x8XeRM https://s.mj.run/6oUsxj-acqs https://s.mj.run/M33VDlt9f2k https://s.mj.run/dKDRmKwbioI a photograph of a couple posing for the camera holding each other, they both wear thick white cable knit sweaters and are in their mid-forties. She is a beautiful, tall and willowy woman, he is clean shaven, rugged and handsome, they stand in a new england field with long grass and brightly colored wildflowers, in autumn, they look towards the camera, there are fieldstone walls in the background and a row of trees at the back of the field showing fall colors in red and yellow, and a small colonial house visible in the distance. There is depth in the photograph, with the house being positioned off in the distance, with rolling hills in between the couple and the house. –ar 2:1 –v 5.1

This is much better, and I could iterate on this to really get to an image that is as good as the one we ended up with from the longer process without as much work.
This is where the state of play is today in Generative AI for images. As a product person, I have a lot of ideas for what needs to happen from a design tools perspective to really supercharge the process.
For instance, I should be able to spend a lot of time generating a single “entity” like “Man in white cable knit sweater” and use that single entity over and over – without having it change each time. I should be able to generate a landscape, get it perfect, and then drop other entities into it. Today that isn’t possible, but you can imagine that this would be a game changer.
I’ve spent a lot of my career building design tools and working in advertising, so I know pretty intuitively what a designer needs in order to live up to the requirements of working with clients. The need to get creative approval on the specific characters (the exact face, the exact sweater, the exact color sweater) are all things that the client would want to approve. Today with generative AI each time you render that prompt, even with really good images input into the model, it’s very hard to get consistent results.
But you probably can now really get a sense of what the future holds. AI powered design tools are going to change everything. And a lot of careers are going to morph over the next few years.