Calculation process of multi-scale structural similarity L1 loss

Multi-scale structural similarity L1 loss

SSIM

Structural Similarity Index (SSIM) is an image quality measurement method used to evaluate the similarity between two images. SSIM is widely used for image quality assessment, performance evaluation of compression algorithms, image enhancement and restoration, etc. In application:

The human eye perceives similarity between two images mainly based on three aspects:

  1. Luminance: Evaluate whether the light and dark features and brightness components of two images are consistent
  2. Contrast: Evaluates the degree of difference in brightness between different areas in two images
  3. Structure: Evaluates texture and detail in two images, as well as the relative positional differences between pixels

In the same way, the calculation of SSIM is also based on the above three aspects, and generates a value between -1 and 1 through quantitative measurement. The closer to 1, the more similar the two images are, and vice versa, the less similar they are.

SSIM calculation

  1. Brightness: Obtain the quantized brightness value of the image by calculating the average of all pixel values. The formula is as follows

μ

x

=

1

N

i

=

1

N

x

i

\mu_x = \frac{1}{N}\sum_{i=1}^{N}x_i

μx?=N1?i=1∑N?xi?

l

(

x

,

y

)

=

2

μ

x

μ

y

+

C

1

μ

x

2

+

μ

y

2

+

C

1

l(x,y)=\frac{2\mu_x\mu_y + C_1}{\mu_x^2 + \mu_y^2 + C_1}

l(x,y)=μx2? + μy2? + C1?2μx?μy? + C1

? Among them, l(x,y) is the brightness difference value of x and y of the two images, and C_1 is a constant.

  1. Contrast: Get the quantitative contrast value of the image by calculating the standard deviation of all pixels. The formula is as follows

    σ

    x

    =

    1

    N

    ?

    1

    i

    =

    1

    N

    (

    x

    i

    ?

    μ

    x

    )

    2

    \sigma_x=\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}(x_i-\mu_x)^2}

    σx?=N?11?i=1∑N?(xiμx?)2
    ?

    c

    (

    x

    ,

    y

    )

    =

    2

    σ

    x

    σ

    y

    +

    C

    2

    σ

    x

    2

    +

    σ

    y

    2

    +

    C

    2

    c(x,y) = \frac{2\sigma_x\sigma_y + C_2}{\sigma_x^2 + \sigma_y^2 + C_2}

    c(x,y)=σx2? + σy2? + C2?2σx?σy? + C2

    Where c(x,y) is the contrast difference value of the two images x and y, and C_2 is a constant.

  2. Structure: Get the structural quantification value of the image by calculating the covariance and standard deviation of the two images. The formula is as follows

    σ

    x

    y

    =

    1

    N

    ?

    1

    i

    =

    1

    N

    (

    x

    i

    ?

    μ

    x

    )

    (

    y

    i

    ?

    μ

    x

    )

    \sigma_{xy}=\frac{1}{N-1}\sum_{i=1}^{N}{(x_i-\mu_x)(y_i-\mu_x)}

    σxy?=N?11?i=1∑N?(xiμx?)(yiμx?)

s

(

x

,

y

)

=

σ

x

y

+

C

3

σ

x

σ

y

+

C

3

s(x,y)=\frac{\sigma_{xy} + C_3}{\sigma_x\sigma_y + C_3}

s(x,y)=σx?σy? + C3?σxy? + C3

? Among them, s(x,y) represents the structural difference value of the two images, and C3 is a constant.

  1. SSIM value

    Finally, the three measures are proportional to obtain the final SSIM value.

S

S

I

M

(

x

,

y

)

=

l

(

x

,

y

)

?

c

(

x

,

y

)

?

s

(

x

,

y

)

SSIM(x,y)=l(x,y)*c(x,y)*s(x,y)

SSIM(x,y)=l(x,y)?c(x,y)?s(x,y)

? If the constant C3 = C2/2 is set, the formula can be obtained by simplifying:

S

S

I

M

(

x

,

y

)

=

(

2

μ

x

μ

y

+

C

1

)

(

2

σ

x

y

+

C

2

)

(

μ

x

2

+

μ

y

2

+

C

1

)

(

σ

x

2

+

σ

y

2

+

C

2

)

SSIM(x,y)=\frac{(2\mu_x\mu_y + C_1)(2\sigma_{xy} + C_2)}{(\mu_x^2 + \mu_y^2 + C_1) (\sigma_x^2 + \sigma_y^2 + C_2)}

SSIM(x,y)=(μx2? + μy2? + C1?)(σx2? + σy2? + C2?)(2μx?μy? + C1?)(2σxy? + C2?)?
? In this way, we only need to calculate five values to get the SSIM value.

μ

x

,

μ

y

,

σ

x

,

σ

y

,

σ

x

y

\mu_x,\mu_y,\sigma_x,\sigma_y,\sigma_{xy}

μx?,μy?,σx?,σy?,σxy?

Partial SSIM

Because humans can only pay attention to local brightness, contrast, structure and other similarity information at the same time, the actual code does not calculate the global mean and variance, but only calculates the mean and variance within the local window. Finally, the values obtained by all local windows are calculated. The quantized values are added and averaged to obtain the final SSIM value.

The formula becomes:

μ

x

=

i

=

1

N

w

i

x

i

\mu_x = \sum_{i=1}^{N}{w_ix_i}

μx?=i=1∑N?wi?xi?

σ

x

=

i

=

1

N

w

i

(

x

i

?

μ

x

)

2

\sigma_x=\sqrt{\sum_{i=1}^{N}w_i(x_i-\mu_x)^2}

σx?=i=1∑N?wi?(xiμx?)2
?

σ

x

y

=

i

=

1

N

w

i

(

x

i

?

μ

x

)

(

y

i

?

μ

x

)

\sigma_{xy}=\sum_{i=1}^{N}w_i{(x_i-\mu_x)(y_i-\mu_x)}

σxy?=i=1∑N?wi?(xiμx?)(yiμx?)

Where w_i is a Gaussian kernel that is symmetrical in all directions. The internal parameter values are sampled from Gaussian distribution, and the sum of all values is 1.
The definition code is as follows:

coords = torch.arange(size) - (size//2)
g = torch.exp(-(coords ** 2) / (2 * sigma ** 2)) #Declare a one-dimensional Gaussian kernel with length size and standard deviation sigma
1_d_gauss_vec = g/g.sum() # Divide by the sum to ensure that the sum of all values is 1
2_d_gauss_vec = torch.outer(1_d_gauss_vec, 1_d_gauss_vec) # Declare the two-dimensional Gaussian kernel of size*size

After visualization, it looks like this:

According to Section 4., in the actual code, we only need to calculate five values, and the local mean code of two images is as follows:

mux = F.conv2d(x, self.g_masks, groups=3, padding=self.pad)
muy = F.conv2d(y, self.g_masks, groups=3, padding=self.pad)

Based on the following reasoning, the variance can be recalculated as:

Right now:

σ

x

2

=

i

N

w

i

x

i

2

?

μ

x

2

\sigma_x^2=\sum_{i}^{N}w_ix_i^2-\mu_x^2

σx2?=i∑N?wi?xi2μx2?

σ

x

y

=

i

N

w

i

(

x

i

y

i

)

?

μ

x

μ

y

\sigma_{xy}=\sum_{i}^{N}w_i(x_iy_i)-\mu_x\mu_y

σxy?=i∑N?wi?(xi?yi?)?μx?μy?

The variance in the above form can be expressed in code, as follows:

sigmax2 = F.conv2d(x * x, self.g_masks, groups=3, padding=self.pad) - mux2
sigmay2 = F.conv2d(y * y, self.g_masks, groups=3, padding=self.pad) - muy2
sigmaxy = F.conv2d(x * y, self.g_masks, groups=3, padding=self.pad) - muxy

Multi-scale SSIM

Multi-scale SSIM calculates SSIM values at multiple scales based on the original SSIM, and finally multiplies them to obtain the final multi-scale SSIM value. Generally, 5-layer multi-scale SSIM values are calculated.
Note that the brightness metric is only calculated at the last layer. This process is implemented by setting Gaussian kernels with different sigma values. The results of visualizing the multi-scale Gaussian kernel are as follows:

There are five rows representing Gaussian kernels at different scales. The three columns represent the Gaussian kernel applied to the three channels of the image. It can be seen that Gaussian kernels of different scales have different focusing fields of view, which simulates human visual discrimination capabilities under different viewing conditions.

Loss function

Using SSIM as a loss function can be used to optimize image restoration, image compression and other tasks. The L1 loss can minimize the absolute difference between images, and combining multi-scale SSIM and L1 loss can have better results in certain multi-task objective optimization. code show as below:

loss_l1 = F.l1_loss(x, y)
loss_ms_ssim = 1 - ms_ssim_socre
total_loss = alpha * loss_ms_ssim + (1 - alpha) * loss_l1

Among them, alpha is used to control the importance of different losses.

Effect

Calculate MS_SSIM_L1_Loss between different pictures, as follows: