Quickly repair images with stable diffusion

Recommended Stable Diffusion automatic texture tool:
DreamTexture.js automatic texturing development kit

What is InPainting?

Image restoration is an active area of artificial intelligence research, and AI is already able to come up with restorations that are better than most artists.

This is a way of generating an image where the missing parts have been filled in with visually and semantically sound content. It can be quite useful for many applications such as advertising, improving your future Instagram posts, editing and repairing your AI-generated images, and it can even be used to repair old photos. There are many ways to perform inpainting, but the most common method is to use a convolutional neural network (CNN).

CNN is well suited for inpainting because it can learn features of the image and can use these features. There are many different CNN architectures available for this purpose.

Introduction to Stable Diffusion

Stable Diffusion is a latent text-to-image diffusion model capable of generating stylized and realistic images. It is pre-trained on a subset of the LAION-5B dataset, and the model can be run on consumer-grade graphics cards at home, so everyone can create stunning works of art in seconds.

How to repair using stable diffusion

This tutorial helps you make tip-based repairs without painting masks using Stable Diffusion and Clipseg. In this case, the mask is a binary image that tells the model which part of the image to draw and which part to keep. A further requirement is that you need a good GPU, but it will also run fine on a Google Colab Tesla T4.

There are 3 mandatory inputs required to perform InPainting.

  1. Enter image URL
  2. Tips for entering parts to replace in the image
  3. Output prompt

You can adjust some parameters

  1. Mask accuracy
  2. Stable diffusion generation intensity

If this is your first time using Hugging Face’s Stable Diffusion, you will need to accept the ToS on the model page and get your Token from your user profile

So let’s get started!

Install the open source Git extension for version control of large files

! git lfs install

Clone the clipseg repository

! git clone https://github.com/timojl/clipseg

Install the diffuser package from PyPi

! pip install diffusers -q

Install more helpers

! pip install transformers -q -UU ftfy gradient

Install CLIP using pip

! pip install git + https://github.com/openai/CLIP.git -q

Now let’s move on to logging in using Hugging Face. To do this, just run the following command:

from huggingface_hub import notebook_login

notebook_login()

Once the login process is complete, you will see the following output:

Login successful
Your token has been saved to /root/.huggingface/token
? clipseg
! ls
datasets metrics.py supplementary.pdf
environment.yml models Tables.ipynb
evaluation_utils.py overview.png training.py
example_image.jpg Quickstart.ipynb Visual_Feature_Engineering.ipynb
experiments Readme.md weights
general_utils.py score.py
LICENSE setup.py
import torch
import requests
import cv2
from models.clipseg import CLIPDensePredT
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt

from io import BytesIO

from torch import autocast
import requests
import PIL
import torch
from diffusers import StableDiffusionInpaintPipeline as StableDiffusionInpaintPipeline

Load model

model = CLIPDensePredT(version='ViT-B/16', reduce_dim=64)
model.eval();
model.load_state_dict(torch.load('/content/clipseg/weights/rd64-uni.pth', map_location=torch.device('cuda')), strict=False);

Not strict since we only store the decoder weights (not the CLIP weights)

device = "cuda"
pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16",
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)

Alternatively, you can load the image from an external URL as follows:

image_url = 'https://okmagazine.ge/wp-content/uploads/2021/04/00-promo-rob-pattison-1024x1024.jpg'
input_image = Image.open(requests.get(image_url, stream=True).raw)

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    transforms.Resize((512, 512)),
])
img = transform(input_image).unsqueeze(0)

Back to Contents

? ..

Convert input image

input_image.convert("RGB").resize((512, 512)).save("init_image.png", "PNG")

display image with help of plt

from matplotlib import pyplot as plt
plt.imshow(input_image, interpolation='nearest')
plt.show()

This will display the following image:

Now we’ll define a hint for the mask, then make predictions, and then visualize the predictions:

prompts = ['shirt']
with torch.no_grad():
    preds = model(img.repeat(len(prompts),1,1,1), prompts)[0]
_, ax = plt.subplots(1, 5, figsize=(15, 4))
[a.axis('off') for a in ax.flatten()]
ax[0].imshow(input_image)
[ax[i + 1].imshow(torch.sigmoid(preds[i][0])) for i in range(len(prompts))];
[ax[i + 1].text(0, -15, prompts[i]) for i in range(len(prompts))];

Now we have to convert this mask into a binary image and save it as a PNG file:

filename = f"mask.png"
plt.imsave(filename,torch.sigmoid(preds[0][0]))

img2 = cv2.imread(filename)

gray_image = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

(thresh, bw_image) = cv2.threshold(gray_image, 100, 255, cv2.THRESH_BINARY)

# For debugging only:
cv2.imwrite(filename,bw_image)

# fix color format
cv2.cvtColor(bw_image, cv2.COLOR_BGR2RGB)

Image.fromarray(bw_image)

Now we have a mask that looks like this:

Now load the input image and created mask

init_image = Image.open('init_image.png')
mask = Image.open('mask.png')

And finally, the last step: fix it according to the prompts of your choice. Depending on your hardware, this will take a few seconds.

with autocast("cuda"):
    images = pipe(prompt="a yellow flowered holiday shirt", init_image=init_image, mask_image=mask, strength=0.8)["sample"]

On Google Colab, you can print out an image simply by entering its name:

images[0]

Now you’ll see that the shirt we created the mask for is replaced by our new prompt!

Reprint: Quickly repair images using stable diffusion (mvrlink.com)