AI image generation is a powerful addition to the digital artist’s toolkit, but it isn’t the be-all and end-all of image creation. While AI leverages knowledge from past artwork to speed up the visual creation process, your creation is only as good as the way you guide it.
The arrival of AI image generators has significantly lowered the barrier to creating stunning digital artwork. Now, anyone with an internet connection or a suitable GPU setup can produce beautiful visuals in seconds—a process that previously could have taken days, weeks, or even months. This article breaks down the essentials of AI image generation, offering a step-by-step guide to crafting effective prompts.
Choosing the AI Image Generator — MidJourney vs. Stable Diffusion vs. DALL·E 3
The three most popular AI text-to-image generation platforms are MidJourney, Stable Diffusion, and OpenAI’s DALL·E 3. While each one has its own strengths and weaknesses, all three are impressive tools worth trying. This is by no means a pitch to convince you to choose one over another, but rather an overview to help with selecting one for specific tasks.
MidJourney – Contest-winning AI image generator
MidJourney is hands-down the champion when it comes to image quality. Its photorealistic images are believable, and the artistic choices made by the AI are generally exceptional. When it comes to tricking human judges and winning photography and art contests, MidJourney is often the one to achieve this (not that I condone this practice).
While image results from MidJourney are great, it requires users to use Discord, navigate through its channels, and type in MidJourney-specific text prompts. For this reason, the learning curve of MidJourney can be slightly steeper. However, resources on usage and prompting are readily available, so it’s worth exploring.
Pro: best in class images.
Cons: lacks dedicated image-based user interface, proprietary.
Stable Diffusion – Open Source AI image generator
Stable Diffusion is an open-source image generative AI developed by Stability AI. Due to its open-source nature, many companies are building their products with Stable Diffusion as a foundation, including Adept Dept.’s very own AI Image Generator. This allows companies to develop products with user-friendly dedicated image interfaces.
Stable Diffusion has great flexibility and can generate images to rival those of MidJourney. The AI models can be trained for more specific tasks. Stable Diffusion models can also run on local machines or hosted GPUs. For people who are interested in the technical side of AI image generation, this might be the choice for you. Stable Diffusion AI models and countless variations created by enthusiasts are essentially free. However, it does require you to have a decent GPU on your local computer to run them. Since it is open-source, there is no limit to what people can create with it. This also means that some of these AI models can produce not safe for work (NSFW) and potentially unethical results.
Pro: great overall performer, open-sourced, dedicated image interfaces from various companies.
Cons: sometimes unpredictable results, potentially NSFW content.
DALL·E 3 – AI image generator built into GPT-4
DALL·E 3 is developed by OpenAI, the company that brought AI into the spotlight with ChatGPT. While previous versions of DALL·E generated images that left much to be desired, DALL·E 3 is exponentially better than its predecessors. When it comes to photorealism and artistic qualities, DALL·E 3 has certainly caught up to Midjourney and Stable Diffusion.
DALL·E 3, with its much improved artistic capabilities, is able to generate impressive images directly through GPT-4 (the paid version of ChatGPT). Regardless of your preference for a dedicated image interface, its default access point is the conversation-style interface of ChatGPT.
Pro: Convenient, familiar ChatGPT interface.
Cons: image results are lacking, no image-dedicated user interface.
Getting inspiration
Creating visuals with AI is very much the same as starting an art project. You need a vision of what you want to create and visual references to help guide the process. Going on Pinterest and typing in keywords related to the subject, art style, or visual message is a great way to find inspiration. The next step is to translate your vision and references into words.
Just as artists in training would look to masters for inspiration, there are gallery websites where you can find images with their prompts. Below is a list of websites with image prompt galleries. However, please be warned that some of these websites contain not safe for work (NSFW) and adult content.
- Freeflo – Well-categorized AI image prompt gallery
- Lexica – A stable diffusion based prompt gallery and generator
- Civitai (NSFW) – Community for AI models and image prompt gallery
- PromptHero (NSFW) – AI image prompt gallery for different platforms
Start Simple
A mistake I often see people make is solely relying on a large language model (LLM) AI, such as ChatGPT, to write their image generation prompts. While these snippets of text might be well-written for people to comprehend, they are often not as useful for image AIs. Highly embellished words and poetic devices can become fillers that confuse image AIs. Literal, concise, and simple prompts are interpreted more accurately.
There is another advantage to starting with a simple prompt: simple prompts allow more room for creativity. Think of prompts as instructions. The less you give, the more creative freedom you allow the recipient. At the beginning of every visual project, you want to allow the AI the creative liberty to inspire you.
Starting Example Prompt and Images
Prompt: An explorer
The point of using a short prompt isn’t to produce a high-quality image right off the bat. Rather, we’re looking for elements that resonate with us. The resulting images from the example prompt included a wide range of art mediums, from digital painting, color photos, and black-and-white photography to traditional oil painting. The subject matter also varied greatly: mountaineers, desert trekkers, army scouts, historic caravans, a canine companion, and space explorers.
Negative Prompts
Negative prompts are, in simple terms, a list of elements you don’t want in your AI-generated image. These could be objects, qualities, or other characteristics you wish to exclude from your image.
In our example, the prompt “An explorer” was producing many black-and-white results. This might be due to the AI referencing historic photos of explorers. To get more colorful results, I added a negative prompt: (black and white).
Iterative Prompting
The AI image creation process should be highly iterative, starting with open and vague directions and moving toward narrow and precise ones. Short and concise prompts produce image results that vary drastically from one another. As creatives, our role is to be the tastemakers, guiding the AI. From a large array of image results, we filter out elements and pick those that inspire us.
It’s a common misconception that master artists can create awe-inspiring artwork with just a few brushstrokes directly on a canvas, or that expert prompt engineers can generate stunningly beautiful images with a single prompt. This is far from the truth. A painting can take an artist months of planning, seeking inspiration, gathering references, drawing countless sketches, creating the underpainting, and making drastic changes midway. AI image generation follows a similar iterative process, only compressed into much shorter timespans.
From seeing the wide range of generated results, I really enjoy the extraterrestrial quality of the environment. To further hone in on this theme, I am adding the word “otherworldly” to describe the location. This gives the AI just enough guidance while still allowing for creative freedom.
Explorative Prompt and Image Examples
Prompt: An explorer, in otherworldly location
Specify Art Style and Medium
The first words in the prompt field are best left for the art style and medium. Since image AIs tend to process prompts sequentially, the first words set the foundation and influence how subsequent words are interpreted. Specifying the art style and medium in your prompt is similar to choosing between oil painting on canvas, pencil sketch on paper, and photos on a film camera. Of course, with AI image generation, you can quickly switch your art styles and mediums after the fact.
Continuing with our example, this might be a good time to lock in the art style and medium. By adding the art style and medium at the beginning of the prompt, we are setting the canvas on which the image is being generated.
Art Medium Prompts and Image Examples
Prompt: {Insert art medium} of an explorer, in otherworldly location
The photorealistic look is closer to my vision. I will include “photo of…” at the beginning of all example prompts from now on.
What’s your focus?
A strong visual composition should have a single main focus. Attention is a limited resource; emphasizing everything means emphasizing nothing. This principle is crucial for AI image prompting. The more words used to describe something, the greater emphasis the AI places on it.
Consider whether the subject or the environment is the primary focus. More descriptive words for the subject will result in a stronger emphasis on it, and vice versa. When both are described in detail, they might compete for the viewer’s attention.
Focusing on the Subject
Describe the quantitative and qualitative characteristics of the subject. While image AIs often struggle with precise numbers, using quantifying words like single, few, some, group, and many can be helpful. Qualities such as color, materials, textures, outfits, and other physical descriptions will further enrich the appearance of your subject.
In our example, “lone” provides quantitative information while conveying a sense of isolation. “Futuristic” offers a general appearance for the explorer. I intentionally omitted outfit details to encourage the AI’s creative interpretation.
Subject-focused Prompt and Image Examples
Prompt: Photo of a lone futuristic explorer, in otherworldly location
From the AI-generated result images, I’d like to narrow down to an orange-colored spacesuit. Here are the adjusted prompt and results.
Prompt: Photo of a lone futuristic explorer wearing a sleek (deep orange) spacesuit, in otherworldly location
Focusing on the Environment
The environment surrounding your subject could also be the image’s main focus. The setting can be divided into three sections: foreground, midground, and background. Use adjectives to describe the atmosphere, lighting, shapes, sizes, and distances.
Environment-focused Prompt and Image Examples
Prompt: Photo of a lone futuristic explorer, in a other worldly location, barren landscape and mountain peaks in a distance, deep blue sky with multiple planets
Put it into Practice
This has been a guide to get you started with AI image generation and prompt engineering. If you’d like to delve deeper into the topic and equip yourself with the vocabulary of AI image prompting, check out AI Image Prompting 101: The Ultimate Guide to AI Image Generation.
You might get the sense that luck plays a role in the quality of the AI-generated images. This is certainly the case when you’re starting and in the inspiration phase. However, as you get better at AI image prompting, you will start to notice the quality of the image results become consistently good.
Want to dip your toe in AI image prompting? Get started with our AI image Generator with 5 free credits. Sign up here.