AI painting guide stable diffusion webui (SD webui) how to set up and use

Based on my recent understanding and practice, I can only say that it is a reference for future AI painting painters to get started quickly.
It mainly involves the introduction of SD webui interface, parameter meaning and adjustment, how to set feature points in txt2img, how to improve the original image in img2img, etc.

Stable diffusion webui (SD webui) interface introduction

  • The default address of SD webui is 127.0.0.1:7860
  • Currently, there is a Chinese interface. The following will mainly be based on the Chinese version, combined with the original English introduction.
  • The project is frequently updated daily, please git pull update regularly: A simple method: in the computer resource manager, open the stable-diffusion-webui folder and enter cmd in the address bar , then press Enter, bring up the command prompt in the directory, then enter git pull in the command prompt window, and then press Enter.

One sentence introduction for each tab

  1. txt2img Wenshengtu: As the name suggests, it uses text to generate pictures.
  2. img2img Picture-generating picture: As the name suggests, it uses pictures to generate pictures.
  3. Extras more: This is actually used to enlarge the picture “losslessly”
  4. PNG info picture information: Get the picture information from the picture exif. If it is a png picture originally generated by SD, the picture generation parameters will be written in the exif information of the picture, so you can see the beautiful pictures generated by the big guys on the Internet. Use this function to help you check
  5. Checkpoint Merger model (ckpt) merge: merge different models to generate a new model
  6. Train training: practice embedding or hypernetwork by yourself
  7. Settings Settings: As the name suggests, it is the settings page
  8. Extensions: As the name suggests, this is the management page for extensions.

Let’s start with a more detailed introduction to each interface.

[Wen Sheng Diagram Interface]

This is probably the most commonly used interface for newcomers. As the name suggests, it is a place where text is used to generate images.

Prompt word Prompt

Use words to describe what you want to generate

Supported languages

The supported input language is English (don’t worry about poor English, there are many tag generators on the Internet for you to use). SD supports natural language description, but it is still recommended to write keywords separated by commas. Of course emoticons, emoji, and even some Japanese are available.

tag syntax
  1. Separation: Different keyword tags need to be separated by English commas , . It does not matter if there are spaces or line breaks before and after the commas.
    ex: 1girl, loli, long hair, low twintails (1 girl, loli, long hair, low twintails)
  2. Mixing: WebUi uses | to separate multiple keywords to mix multiple elements. Note that the mixing is equal proportions and mixed at the same time.
    ex: 1girl,red|blue hair, long hair (1 girl, red and blue hair mixed, long hair)

  3. Strengthen/weaken: There are two ways to write it

  • The first type (prompt word: weight value): the value ranges from 0.1 to 100. The default state is 1. If it is lower than 1, it will be weakened. If it is greater than 1, it will be strengthened.
    ex: ,(loli:1.21),(one girl:1.21),(cat ears:1.1),(flower hairpin:0.9)

  • In the second type (((prompt word))), each set of () brackets is enhanced by 1.1 times, and each set of [] is weakened by 1.1 times. That is to say, a two-layer set is 1.1*1.1=1.21 times, a three-layer set is 1.331 times, and a four-layer set is 1.4641 times.

ex: ((loli)),((one girl)),(cat ears),[flower hairpin] is equivalent to the first way of writing

  • Therefore, it is recommended to use the first method because it is clear and accurate.

4. Gradient: To understand it simply, it is generated by a certain keyword first, and then changes in a certain direction based on this.

  • [Keyword 1: Keyword 2: Number], the number greater than 1 is understood as keyword 1 before step X, and becomes keyword 2 after step X. The number less than 1 is understood as X percent of the total number of steps. is keyword 1, and then becomes keyword 2
  • ex:a girl with very long [white:yellow:16] hair is equivalent to

Start a girl with very long white hair

After 16 stepsa girl with very long yellow hair

  • ex:a girl with very long [white:yellow:0.5] hair is equivalent to

Start a girl with very long white hair

After 50% step a girl with very long yellow hair

  1. Alternate: Take turns using keywords

ex: [cow|horse] in a fieldFor example, this is a mixture of cow and horse. If you write it longer, for example, [cow|horse|cat|dog] in a fieldIt means first working hard to be like a cow, then working hard to be like a horse, then working hard to be like a cat, then working hard to be like a dog, and then working hard to be like a horse.

tag writing example

It is recommended to write prompt words in a format similar to this

Picture quality word>>
This is generally relatively fixed, nothing more than a masterpiece, the highest image quality, a super high resolution, etc.

Style word art style word >>
For example, is it a photo, illustration, or animation?

Theme of the picture >>
For example, is the subject of this painting a girl, a cat, a child, a lolita, or a girl, a cat girl, a dog girl, or Fury, a white-collar worker or a student?

Their appearance >>
Note that the whole and details are described from top to bottom, for example
Hair style (stupid hair, hair behind the ears, bangs covering the eyes, low ponytail, big wavy hair),
Hair color (blonde at the top, colored highlights at the ends),
Clothing (long skirt, lace trim, low-cut, translucent, blue bra, blue panties, half-length sleeves, knee-high socks, indoor shoes),
Head (cat ears, red eyes),
neck (necklace),
Arms (shoulderless),
Breasts (small breasts),
Abdomen (navel visible),
butt (camel shame),
legs (long legs),
Footsteps (bare feet)

Their emotions >>
express expression

Their posture >>
Basic movements (standing, sitting, running, walking, squatting, lying down, kneeling),
Head movements (tilt, raise, lower head),
Hand movements (hands combing hair, placing hands on chest, raising hands),
waist movements (bending, sitting astride, duck sitting, bowing),
Leg movements (crossed standing, crossed legs, M-shaped legs, cross-legged, kneeling),
Compound movements (fighting stance, JOJO stand, back-to-back standing, taking off clothes)

Picture background >>
Indoors, outdoors, in the woods, on the beach, under the stars, under the sun, whatever the weather is like

Miscellaneous >>
For example, NSFW, the eyes are depicted in detail

Bash

Copy

Separate words from different categories through line breaks to make it easier for you to adjust at any time.

(masterpiece:1.331), best quality,
illustration,
(1girl),
(deep pink hair:1.331), (wavy hair:1.21),(disheveled hair:1.331), messy hair, long bangs, hairs between eyes,(white hair:1.331), multicolored hair,(white bloomers:1.46),( open clothes),
beautiful detailed eyes,purple|red eyes),
expressionless,
sitting,
dark background, moonlight, ,flower_petals,city,full_moon,

Bash

Copy

So we get a picture like this

Key points for tag writing
  1. Although everyone calls this the release of magic, the longer the spell (prompt word) is, the more powerful the picture will be. Please try to control the keywords within 75 (100).
  2. The more critical words, the further forward they are placed.
  3. Similar things of the same kind, put together.
  4. Only write necessary keywords.
Negative prompt

Use words to describe what you don’t want in the image
The general approach of AI is
1. Denoise the image to make it look more like your prompt word.
2. Denoise the image to make it look more like your reverse cue word (unconditional).
3. Observe the difference between the two and use it to generate a set of changes to the noisy image
4. Try to move the end result towards the former and away from the latter
5. A relatively common negative prompt word setting

lowres,bad anatomy,bad hands,text,error,missing fingers,
extra digit,fewer digits,cropped,worst quality,
low quality,normal quality,jpeg artifacts,signature,
watermark,username,blurry,missing arms,long neck,
Humpbacked,missing limb,too many fingers,
mutated,poorly drawn,out of frame,bad hands,
unclear eyes,poorly drawn,cloned face,bad face

Bash

Copy

Sampling Steps

The principle of AI painting in human terms is to randomly generate a noise picture
Then adjust the image step by step to get closer to your Prompt word Prompt
Sampling Steps tells the AI how many times such steps should be performed.
The more steps there are, the smaller and more precise each movement becomes. It also increases proportionally the time required to generate the image.
Most samplers lose much meaning after more than 50 steps.
The picture below shows the changes in the image at different steps from 1 step to 20 step for the same picture.

Sampling method

Which sampler to use depends on what algorithm the AI uses.
Here we only introduce the commonly used ones, and you can figure out the less commonly used ones by yourself.

  1. Euler a: Creative, different pictures can be produced with different number of steps. There is basically no gain beyond 30 to 40 steps.
  2. Euler: The most common and basic algorithm, the simplest, and the fastest.

  3. DDIM: Convergence is fast, usually 20 steps is enough.

  4. LMS: Eular’s extension algorithm is relatively more stable, and it is relatively stable in 30 steps.

  5. PLMS: Improve LMS a little more

  6. DPM2: An improved version of DDIM that is approximately twice as fast as DDIM

Generate batch Batch count/n_iter

Same configuration, run several times in a loop

Batch size

How many images are generated simultaneously. Increasing this value can run in parallel, but you also need more graphics card memory. You can adjust it yourself by looking at the memory usage display in the task manager.
For basic 512X512 pictures, SD1.4 model, Euler a, 4G video memory can parallelize 2 pictures, and 8G video memory can parallelize 8 pictures.

Each time the generate button is clicked, the total number of images generated = generated batch X number of each batch

Prompt word relevance CFG Scale

How well the image matches your prompt.
Increasing this value will result in an image closer to your tip, but going too high will make the image too saturated (you can try it yourself)
If it is too high, the image quality will be reduced to a certain extent. Sampling steps can be appropriately increased to offset the degradation of image quality.
Generally, it is better to be between 5 and 15. 7, 9, and 12 are three common setting values.

Width X Height

The unit is pixels. Increase the size appropriately and AI will try to fill in more details.
Very small size (less than 256X256) will give AI no room to play and will lead to a decrease in image quality.
Very high sizes (larger than 1024X1024) will cause AI to behave randomly and lead to a decrease in image quality.
Increasing size requires more video memory. The maximum 4GB video memory should be 1280X1280 (limit)

Because common models are basically trained on the basis of 512×512 and 768X768
If the resolution is too high, the picture quality will deteriorate as the resolution increases.
Generally, AI with a size of 1024X1024 or above will create all kinds of ghost pictures.
If the model determines that certain resolutions are optimal, follow the model’s requirements
For example, the 3DKX series models clearly recommend a picture resolution of 1152 x 768.

If you really want to generate high-resolution images, use the “Hires.fix” function.

Random seed Seed

As mentioned before, the principle of AI painting is to randomly generate a noise picture
Because true randomness does not exist in the computer world
Keeping the seed unchanged, the same model and backend, and keeping all parameters consistent,
The same seed can generate (almost) the same image multiple times.
If you use a certain seed to generate a great picture under a certain tag,
Keep the number of seeds unchanged, slightly change the tags, add or delete some details, and the resulting image will generally be good.

  • Different models of graphics cards may generate completely different graphs even if the parameters are exactly the same as the model.
    Basically, each model of the 10XX and 16XX series graphics cards will have different results. The 20XX series and the 30XX series can basically reproduce the picture perfectly.
  • This does not mean that 10XX series graphics cards are not suitable for AI painting, but you may see that the parameter diagram of a netizen is great, but if you want to copy it, the result will be completely different.
  • Some models, such as anything3.0, have poor image reproduction performance because the model is too chaotic.
  • There is an option parameter in the settings called ENSD (eta noise seed increment), which will change the seed. Some extensions can also randomly fine-tune the seed under the same seed, which may make it impossible to reproduce other people’s pictures.
Facial Repair

Use the model to repair the face of the character (mainly a three-dimensional real person) in the generated picture to make the face more like a real person. The specific settings are in [Settings] – [Face Repair]
1. There are basically two models: CodeFormer and GFPGAN. As for which one is better, it’s hard to say. It depends on the model. It is recommended to try both.
2. CodeFormer weight parameter; when it is 0, the effect is the greatest; when it is 1, the effect is the least. It is recommended to start from 0.5 and try around to find the setting you like.
3. This does not mean that facial restoration cannot be used on 2D pictures. To a certain extent, the quality of 2D facial painting can also be improved.

Tiling

In a word: Generate images that can be continuously spliced to the left, right, up and down. (Have you ever seen collage tiles?)

Super resolution Hires. fix

txt2img produces very weird images at high resolution (1024X1024). This plug-in allows AI to first partially render your image at a lower resolution, then algorithmically increase the image to a high resolution, and then add details at the high resolution.

  • Amplification algorithm: If you don’t know what to choose, it’s usually a no-brainer to choose “ESRGAN_4x”
  • Redraw range: The degree of detail modification after enlarging, from 0 to 1. The larger the value, the more creative the AI will be and the further it will deviate from the original image.
  • Upscale by: Magnification factor, how many times the original width and length are enlarged. Note that this increase requires higher video memory.

【Settings】 Settings

The setting interface is too complicated. I won’t go into details. I’ll just talk about how to set translation and select model usage parameters.

User interface User interface

Scroll to the bottom to select interface translation. After selecting, remember to go to the top of the web page, save (Apply settings) first, and then restart (Reload Ul).

【Extension interface】Extensions

Extension Installed

Displays installed extensions (scripts, translations, tabs). If checked, the extension is enabled. If not checked, the extension is not enabled.
– Apply and restart UI Apply and restart UI: Restart the interface after updating and reload the UI interface.
– Check for updates: Check for installed extension updates

Available

Display the currently supported extensions, click the load button, and go to the official website to pull the latest extension list (you need to be able to access github)

  • Hide extensions of checked types Hide extensions with tags: Extensions of checked types will be hidden. It is generally recommended to check ads (including advertisements) and installed (to install).
  • Sorting method Order: It is to set the sorting method of the extended list, whether it is latest to oldest, or in alphabetical order, etc.
  • Below is the directory of the extension, which displays the extension’s name, introduction, and Install button. Click the extension’s name to jump to the corresponding extension’s warehouse page to see the author’s detailed introduction.

Install from URL

To install the extension yourself, enter the installation URL address given to you by the extension and click the Install button.

How to set it to Chinese version

  1. First click to switch to [Extensions Page Extensions], then click [Available], then click [Load from: Load from:]
  2. In [Hide extensions with tags], uncheck “localization”
    Find zh_CN Localization or zh_TW Localization and click the last Install button
  3. Click the [Installed Extension Installed] page, make sure “stable-diffusion-webui-localization-**_**” is checked at the bottom of the page, click [Apply and restart UI Apply and restart UI] to restart the page.
  4. Switch to [Settings page Settings], find [User interface] on the left, and scroll down to the end.
  5. Select the language you need in the drop-down box
  6. Return to the top of the web page, first click [Apply settings], then click [Reload UI Reload UI]
  7. If there is no problem, your interface is in Chinese.