top of page
big-dark-grey-cat-in-round-glass-bowl-filled-with-water-surrounded-by-small-dark-fish-swim

TEXT TO IMAGE

The tutorial below will take you through the usage of an open sourced creative platform called Fooocus which is a highly developed AI image generator that also has a user friendly interface. I have some thoughts on Text to Image, regarding how it is a big mistake to assume that these platforms will create an image for you based on a simple demand. The truth of the matter is that it is you and you alone that makes the picture by using a novel technology as a technical assistant. Nothing less than that but nothing more either. Read more here >>>

Fooocus, which has been created by Illyasviel, can be downloaded from Github here: https://github.com/lllyasviel/Fooocus

At the top of the page you will see the code. When you scroll down the page, past the code section and a comparison chart to Midjourney,  you will see the download link, which is Windows only btw. Do what it says on the page, download and uncompress the zip file. In it you will see 3 .bat files. You only need to run the first of these which is called run.bat. Double click on this file which will open a DOS window and load the platform. The other two will load other presets, which I personally haven't tried yet.

vvv

Note: The DOS window that opened when you ran the run.bat should be minimized but kept open during all the time that you are working with Fooocus

Once the run file has loaded your default browser will open a new tab and on it you will see what follows from here on as a series of tutorial screenshots: 

MAIN

Inpainting and Outpainting

Now that we have gone through the main window of Fooocus, we come to the little "Input Images" check box on the bottom left. Once you uncheck this the page gets longer and you get a menu with 4 tabs.

vvv

The first of these is called "Upscale or Variation". With this you can upscale an image by dragging or uploading it into the little window. I would not advice this since it takes a long time and there is software out there which we will get to later that does a very good job of this task. But what you can also do is create variations of the uploaded or dragged image. In my experience "Vary Subtle" hardly makes a change to the original image, so that one I wouldn't bother with. However "Vary Strong" does make changes. The issue is that the changes are oftentimes too drastic, move too far from the original. It is a good idea to have the same prompts as well as the same seed instead of a random one.

vvv

As I have already shown above, you have access to your history from where you can obtain all of this info. And not only do you have access to the session log but you actually have access to your entire Fooocus history which you can find inside a folder called "Output" that is in the Foocus folder nested inside the folder that you downloaded that also contains the run.bat file. It is a good idea to clean up this output archive from time to time and move whatever you want to hold onto to an external drive since this will end up taking up a huge amount of space on your C drive.

variation-all.jpg

The second tab is called "Image Prompt" and here you can place 4 different images inside 4 windows and create an iteration without having to write anything, although I find that writing an additional prompt helps. What you need to be super careful of, if you decide to do this is to use only your own images or creative commons images that you have permission to use. I am not showing a completed iteration here since the result is somewhat similar to the "Vary Strong" feature above. Oftentimes the iterations are too far removed from the prompt images, even if the prompt images are actually very similar and you are using the same styles and the same seed.

image-prompt-1.jpg

The third tab is where the action really starts: This is called "Inpaint or Outpaint" and here you can do some really wonderful stuff, especially when it comes to achieving high end detailed results. The first time that you decide to do this Fooocus will download some additional things to your computer which takes a little while.

vvv

Outpainting makes the image that you drag into the window wider or higher or both. This is the default mode, it is what you see first in the little drop down menu. 

outpaint.jpg

You can use the default "Inpaint or Outpaint" mode to also change content as shown in the gallery below:

One of the other two choices in the pulldown is "Improve Detail (face, hand, eyes, etc.)" which is extremely useful for face details or any other detailed stuff that you wish to improve upon. However, in order to get this to work really well you should upscale whatever it is that you want to improve to at least twice the size (3 or 4 is even better), so that the algorithm has enough pixels to work with. Trying to improve details on a small resolution image will give you only mediocre results. Scroll down to the "Uspcaling" section on this page to find out more on how to Upscale by using a free software called Upscayl.

detail-5.png

So, here is the result in which the face details have been fixed on the left and in their original raw state on the right. As you can see from the 100% zoom on the bottom the fix was performed on a very large image which was upscaled to 3 times the original size. With the original sizes that Fooocus gives you, which are quite small, there is no way that you would get a good result. So, always remember to upscale before you start making detail fixes.

Under the Inpaint Outpaint tab there is a third option that you can access from the drop down menu called "Modify Content (add objects, change background, etc.)". I am not going to add screenshots to show you the procedure for this since it is exactly like the one above for improving details: You drag or upload the picture into the window, paint over the area where you want to either add things or remove things, or replace things and then press generate. Again, you make sure that you have the same styles that you find in your history log enabled and then generate the change.

vvv

I will, however, show you 4 images, one of the "before" and then 3 variations to show you the sorts of results that you can achieve. I deleted the main prompt at the top so that the algo did not get confused and added cats, which was a word in that main prompt, then I wrote into the small inpaint prompt box "add gold leaves and gold art nouveau stuff scrolls remove table remove platform". I painted the entire area around the cat, and this is what I got:

IN/OUTPAINT

The Prompt

Now that you know how it works, pay attention to the following while you are putting together your prompt since this is the most important part of this whole technique. Essentially what you are doing is painting a picture with words rather than a pencil or a brush or a computer mouse. But, the process is exactly the same - you first have to imagine the picture. So, close your eyes and visualize exactly what the picture should look like. This means that you need to bring together many different visual components in your head and then put them into a type of natural writing (not keywords, in other words, but proper small sentences, separated by commas) that is uncomplicated enough for the algorithm to understand but also holds all the information that it needs to do what you want it to do:​

  • all of the objects that you want to see in the big "positive" prompt box under render area written in order of importance, even small things.

  • all of the things that you do not want to see written in the small "negative" prompt box on the side.

  • the activity of the animate things (people, animals, vehicles, etc) in the picture: Are they sitting down, flying, running, doing aerobics, sleeping, making love, eating, etc etc? You have to specify all of that also.

  • material of background (such as "infinity background" or "thickly wooded forest")

  • material of ground or placement surface (such as "tablecloth" or "dark soil")

The mood of the picture:

  • happy

  • upbeat

  • party

  • sad

  • depressed

  • scary

​​

  • gloomy

  • serious

  • business

  • moody

  • romantic

  • meeting

​

  • natural

  • appetizing

  • clean

  • hygienic

  • etc etc

​The type of shot angle that you want. (Read more about camera angles here >>>) This could be:​

  • closeup

  • high angle

  • low angle

  • dutch angle

  • eye level

  • bird's eye

  • medium angle

  • long angle

  • 1/2

  • 1/4 shot

  • 3/4 shot

  • top view

  • Knoll view (there is actually a style preset for this one)

  • panoramic

  • landscape

  • etc et

​The style of the image:​

​​​

  • portrait

  • landscape

  • chiaroscuro

  • still life

  • interior

  • illustration

  • vector art

  • minimalistic

  • modern

​​​

  • avantgarde

  • fashion

  • blurry background

  • symmetrical

  • asymmetrical

  • clustered

  • copy space (will be needed for images that may get text placed upon them while being used as part of a graphic design project)

  • etc etc

Color schemes:

  • contrast

  • pastel

  • rich

  • desaturated

  • grayscale

  • black and white

  • warm

  • cold

  • complementary between [color 1] [color 180 degrees opposite]

  • analogous

  • triadic between [color 1] [colors 2 and 3 at 120 degrees either side]

Lighting schemes, such as:

  • soft

  • diffused

  • dramatic

  • stark

  • studio lighting

  • spotlight

  • misty

  • foggy

  • dark day

  • sunlight

  • sunset

  • etc etc

PROMPTS

Prompt Examples

I have made an extended tutorial that shows how a prompt slowly progresses as you make iterations, which I think will be very useful to you if you are serious about generating specific content that really does show what you want to show, or something very close to your initial intention, anyway. Go here >>> to see it.

vvv

You can go to my website to see abbreviated prompts for all of the series that I have there: https://www.trivialthingies.com/portfolio

The actual prompts, however, are much longer and more detailed than what I have placed there. Here are some examples:

Example 1:

07.png

Prompt: closeup of a huge serious dark tabby cat covering 3/4 of canvas, big head big eyes, curvy body contours, sitting inside a mushroom forest, leaves transparent translucent, mushroom stems curved, all soft curved, high ground filled with trees and mushrooms, evening clouds dark sky, desaturated color palette

vvv

Negative: sun moon, realism

vvv

Style presets: Googie Art Style, Color Field Painting, SAI Fantasy Art, Misc Dreamscape

Example 2:

08-U.jpg

Prompt: close up side view, gold black colors, still life, cloudy misty golden autumn ambience, black soil, bare branches, huge tall many layered sliced chocolate fudge cream cake with gold black extravagant ornaments cake covers 3/4 of picture, cake stand, gold black leaves, extravagant ornaments, twigs, pralines, gold black damask crumpled napkin, extra extra light on front of cake, background huge leaves

vvv

Negative: knife fork spoon, cutlery, candles, green moss, sun moon

vvv

Style presets: Fooocus v2, Fooocus Enhance, Fooocus Sharp, SAI Digital Art

Example 3: 

image (12).png

Prompt: closeup of huge cat in vector flat art style with big head long thin warped neck big eyes slender body, sitting inside a mushroom tree forest, tree trunks and stems warped and curved, transparent and translucent leaves, desaturated evening colors

vvv

Negative: sun moon

vvv

Style presets: Googie Art Style, Color Field Painting, Constructivism

Example 4: In this final one I am using the name of a painter called John Singer Sargent in order to describe the style of the picture that I want to generate. Had I not known about this painter it would have taken an extremely long and complicated prompt to create the fuzzy, blurry and yet defined style that Sargent was known for; and it might not have worked at all. Thus, it is a very good idea, and a huge time and aggravation saver, to use the names of art styles, artists, designers and especially historic periods of art to define these things. This does not necessarily mean that you will be creating a picture that relates directly to that name or that period; it only means that you are alerting the algorithm to certain stylistic attributes that are associated with these. Sargent was long dead by the time the Flappers made their appearances; it is only his style of painting that I wanted to borrow and combine with a 1920s street atmosphere - not the actual theme of his paintings which, given that he was a painter of the 19th century, are very different from my end result.

53.jpg

Prompt: 1/2 shot, soft desaturated watercolor john singer sargent style, serious pale flappers, feathers long pearls, soft summer afternoon cream and dark gray outfit, art deco narrow street, street lights misty cold rainy water reflections dark big dark clouds desaturated old photo

vv

Style presets: Fooocus v2, Fooocus Enhance, SAI Digital Art, Impressionism

Bear in mind that you will need to make many iterations of one image, and you will probably end up deleting most of them, only keeping the few perfect ones. So, patience is the name of the game here. In my experience as you first start to get familiar with the technique, 3 out of every 4 iterations will be junk, and only one will be good enough to keep. As you become more experienced, this ratio slowly starts to reverse itself and 3 out of 4 become worth keeping. So, at the end of the session you will have way more than you need, even if you were deleting unwanted ones while you were still rendering. So, go through all of them with a fine tooth comb at the very end and only keep the best of the best. 

vvv

One other thing is that, even if you are quite experienced at writing prompts, for some weird reason it takes the algorithm some time to figure out exactly what it needs to do. So, chances are that the first few rounds of iterations will not give precise results, no matter how good your prompt is - certain things will be left out, certain things that you did not write will appear and so on.  And the compositions will also improve over time. My advice is to set the iteration number to something high, like 8 or 10 or even 12, put the setting to speed rather than quality, start the render, and then go off and do something else for a while, giving the beastie some time to fumble around and learn what it needs to do.

EXAMPLES

Upscaling

Fooocus has a built in upscaler, however this takes a long time since the image will get re-rendered. And also in order to get a truly high res image you need to repeat the procedure. What I have discovered instead is Upscayl, which is a truly great open source software that you can download and install on your own machine, with which you can also do batch upscales. Get it here: https://www.upscayl.org/#download

vvv

An important thing to remember is that upscaled images will have very large file sizes and will therefore take up huge amounts of space in your computer. So, even though the software gives you the option, do not do batch upscales - instead, only upscale what you will be using right there and then, and keep everything else in the smaller format, which you can always go back to and upscale later when the need arises.

vvv

Also make sure that you have the correct upscale preset selected: Upscayl has several for photos and one for digital art that you can find under the drop down menu on the left in the middle that says "select model". The digital upscale will not work well for photos and vice versa. But there are several photo presets and you should try out all of them since the nature of the image, its textures and lighting and so on, works better with some than it does with others.

Language

Although there are rumors that text based generative software works in all languages, I would not pay that rumor a lot of attention myself. More than 50% of the internet's content is in English, and these platforms all conduct their searches based upon what is out there. And, if more than 50% of what is out there is in one language only, that is the language that will get the results, since whatever remains of the rest, all of that is divided into very small percentages across many languages, with Turkish only having a share of around 5%. So, bottom line, do not expect great results in any language other than English. And I am not saying this out of any kind of Anglophilia, I am simply stating a fact. 

So, what to do if you do not posses a very wide English vocabulary? You use one of the online translators.  Make your prompts in your own language and then put them through an online translator, which should work to some extent. You can put them in as a list of keywords and that will give you a more accurate translation, however it will probably not give you the sorts of results that "natural writing" does.

And start polishing up your English! Again, I am no anglophile, quite the opposite, if aything. However, the fact of the matter is that English is the "lingua franca" of today's world, and whether we like it or not, that is the language that we do need to learn if we want to get ahead, even with something like a text to image platform.

UPSCALE
bottom of page