top of page
big-dark-grey-cat-in-round-glass-bowl-filled-with-water-surrounded-by-small-dark-fish-swim

TEXT TO IMAGE

The tutorial below will take you through the usage of an open sourced creative platform called Fooocus which is a highly developed AI image generator that also has a user friendly interface. I have some thoughts on Text to Image, regarding how it is a big mistake to assume that these platforms will create an image for you based on a simple demand. The truth of the matter is that it is you and you alone that makes the picture by using a novel technology as a technical assistant. Nothing less than that but nothing more either. Read more here >>>

Fooocus, which has been created by Illyasviel, can be downloaded from Github here: https://github.com/lllyasviel/Fooocus

At the top of the page you will see the code. When you scroll down the page, past the code section and a comparison chart to Midjourney,  you will see the download link, which is Windows only btw. Do what it says on the page, download and uncompress the zip file. In it you will see 3 .bat files. You only need to run the first of these which is called run.bat. Double click on this file which will open a DOS window and load the platform. The other two will load other presets, which I personally haven't tried yet.

vvv

Note: The DOS window that opened when you ran the run.bat should be minimized but kept open during all the time that you are working with Fooocus

Once the run file has loaded your default browser will open a new tab and on it you will see what follows from here on as a series of tutorial screenshots: 

Now that you know how it works, pay attention to the following while you are putting together your prompt since this is the most important part of this whole technique. Essentially what you are doing is painting a picture with words rather than a pencil or a brush or a computer mouse. But, the process is exactly the same - you first have to imagine the picture. So, close your eyes and visualize exactly what the picture should look like. This means that you need to bring together many different visual components in your head and then put them into a type of natural writing (not keywords, in other words, but proper small sentences, separated by commas) that is uncomplicated enough for the algorithm to understand but also holds all the information that it needs to do what you want it to do:

  • all of the objects that you want to see in the big "positive" prompt box under render area written in order of importance, even small things.

  • all of the things that you do not want to see written in the small "negative" prompt box on the side.

  • the activity of the animate things (people, animals, vehicles, etc) in the picture: Are they sitting down, flying, running, doing aerobics, sleeping, making love, eating, etc etc? You have to specify all of that also.

  • material of background (such as "infinity background" or "thickly wooded forest")

  • material of ground or placement surface (such as "tablecloth" or "dark soil")

The mood of the picture:

  • happy

  • upbeat

  • party

  • sad

  • depressed

  • scary

  • gloomy

  • serious

  • business

  • moody

  • romantic

  • meeting

  • natural

  • appetizing

  • clean

  • hygienic

  • etc etc

The type of shot angle that you want. (Read more about camera angles here >>>) This could be:

  • closeup

  • high angle

  • low angle

  • dutch angle

  • eye level

  • bird's eye

  • medium angle

  • long angle

  • 1/2

  • 1/4 shot

  • 3/4 shot

  • top view

  • Knoll view (there is actually a style preset for this one)

  • panoramic

  • landscape

  • etc et

​The style of the image:​

​​

  • portrait

  • landscape

  • chiaroscuro

  • still life

  • interior

  • illustration

  • vector art

  • minimalistic

  • modern

​​

  • avantgarde

  • fashion

  • blurry background

  • symmetrical

  • asymmetrical

  • clustered

  • copy space (will be needed for images that may get text placed upon them while being used as part of a graphic design project)

  • etc etc

Color schemes:

  • contrast

  • pastel

  • rich

  • desaturated

  • grayscale

  • black and white

  • warm

  • cold

  • complementary between [color 1] [color 180 degrees opposite]

  • analogous

  • triadic between [color 1] [colors 2 and 3 at 120 degrees either side]

Lighting schemes, such as:

  • soft

  • diffused

  • dramatic

  • stark

  • studio lighting

  • spotlight

  • misty

  • foggy

  • dark day

  • sunlight

  • sunset

  • etc etc

Prompt Examples

You can go to my website to see abbreviated prompts for all of the series that I have there: https://www.trivialthingies.com/portfolio

The actual prompts, however, are much longer and more detailed than what I have placed there. Here are some examples:

Example 1:

07.png

Prompt: closeup of a huge serious dark tabby cat covering 3/4 of canvas, big head big eyes, curvy body contours, sitting inside a mushroom forest, leaves transparent translucent, mushroom stems curved, all soft curved, high ground filled with trees and mushrooms, evening clouds dark sky, desaturated color palette

vvv

Negative: sun moon, realism

vvv

Style presets: Googie Art Style, Color Field Painting, SAI Fantasy Art, Misc Dreamscape

Example 2:

08-U.jpg

Prompt: close up side view, gold black colors, still life, cloudy misty golden autumn ambience, black soil, bare branches, huge tall many layered sliced chocolate fudge cream cake with gold black extravagant ornaments cake covers 3/4 of picture, cake stand, gold black leaves, extravagant ornaments, twigs, pralines, gold black damask crumpled napkin, extra extra light on front of cake, background huge leaves

vvv

Negative: knife fork spoon, cutlery, candles, green moss, sun moon

vvv

Style presets: Fooocus v2, Fooocus Enhance, Fooocus Sharp, SAI Digital Art

Example 3: 

image (12).png

Prompt: closeup of huge cat in vector flat art style with big head long thin warped neck big eyes slender body, sitting inside a mushroom tree forest, tree trunks and stems warped and curved, transparent and translucent leaves, desaturated evening colors

vvv

Negative: sun moon

vvv

Style presets: Googie Art Style, Color Field Painting, Constructivism

Example 4: In this final one I am using the name of a painter called John Singer Sargent in order to describe the style of the picture that I want to generate. Had I not known about this painter it would have taken an extremely long and complicated prompt to create the fuzzy, blurry and yet defined style that Sargent was known for; and it might not have worked at all. Thus, it is a very good idea, and a huge time and aggravation saver, to use the names of art styles, artists, designers and especially historic periods of art to define these things. This does not necessarily mean that you will be creating a picture that relates directly to that name or that period; it only means that you are alerting the algorithm to certain stylistic attributes that are associated with these. Sargent was long dead by the time the Flappers made their appearances; it is only his style of painting that I wanted to borrow and combine with a 1920s street atmosphere - not the actual theme of his paintings which, given that he was a painter of the 19th century, are very different from my end result.

53.jpg

Prompt: 1/2 shot, soft desaturated watercolor john singer sargent style, serious pale flappers, feathers long pearls, soft summer afternoon cream and dark gray outfit, art deco narrow street, street lights misty cold rainy water reflections dark big dark clouds desaturated old photo

vv

Style presets: Fooocus v2, Fooocus Enhance, SAI Digital Art, Impressionism

Bear in mind that you will need to make many iterations of one image, and you will probably end up deleting most of them, only keeping the few perfect ones. So, patience is the name of the game here. In my experience as you first start to get familiar with the technique, 3 out of every 4 iterations will be junk, and only one will be good enough to keep. As you become more experienced, this ratio slowly starts to reverse itself and 3 out of 4 become worth keeping. So, at the end of the session you will have way more than you need, even if you were deleting unwanted ones while you were still rendering. So, go through all of them with a fine tooth comb at the very end and only keep the best of the best. 

vvv

One other thing is that, even if you are quite experienced at writing prompts, for some weird reason it takes the algorithm some time to figure out exactly what it needs to do. So, chances are that the first few rounds of iterations will not give precise results, no matter how good your prompt is - certain things will be left out, certain things that you did not write will appear and so on.  And the compositions will also improve over time. My advice is to set the iteration number to something high, like 8 or 10 or even 12, put the setting to speed rather than quality, start the render, and then go off and do something else for a while, giving the beastie some time to fumble around and learn what it needs to do.

Upscaling

Fooocus has a built in upscaler, however this takes a long time since the image will get re-rendered. And also in order to get a truly high res image you need to repeat the procedure. What I have discovered instead is Upscayl, which is a truly great open source software that you can download and install on your own machine, with which you can also do batch upscales. Get it here: https://www.upscayl.org/#download

vvv

An important thing to remember is that upscaled images will have very large file sizes and will therefore take up huge amounts of space in your computer. So, even though the software gives you the option, do not do batch upscales - instead, only upscale what you will be using right there and then, and keep everything else in the smaller format, which you can always go back to and upscale later when the need arises.

vvv

Also make sure that you have the correct upscale preset selected: Upscayl has several for photos and one for digital art that you can find under the drop down menu on the left in the middle that says "select model". The digital upscale will not work well for photos and vice versa. But there are several photo presets and you should try out all of them since the nature of the image, its textures and lighting and so on, works better with some than it does with others.

Language

Although there are rumors that text based generative software works in all languages, I would not pay that rumor a lot of attention myself. More than 50% of the internet's content is in English, and these platforms all conduct their searches based upon what is out there. And, if more than 50% of what is out there is in one language only, that is the language that will get the results, since whatever remains of the rest, all of that is divided into very small percentages across many languages, with Turkish only having a share of around 5%. So, bottom line, do not expect great results in any language other than English. And I am not saying this out of any kind of Anglophilia, I am simply stating a fact. 

So, what to do if you do not posses a very wide English vocabulary? You use one of the online translators.  Make your prompts in your own language and then put them through an online translator, which should work to some extent. You can put them in as a list of keywords and that will give you a more accurate translation, however it will probably not give you the sorts of results that "natural writing" does.

And start polishing up your English! Again, I am no anglophile, quite the opposite, if aything. However, the fact of the matter is that English is the "lingua franca" of today's world, and whether we like it or not, that is the language that we do need to learn if we want to get ahead, even with something like a text to image platform.

PROMPTS
LANGUAGE
UPSCALE
EXAMPLES
bottom of page