DehazeNet: An End-to-End System for Single Image Haze Removal (end-to-end dehazing model)

1. Overall idea of dehazing the paper

DehazeNet is an end-to-end deep learning model proposed by researchers at South China University of Technology in 2016. The model mainly uses the input original hazy image to fit the medium transmission map (transmittance t value map) corresponding to the image, and Use guided filtering to refine the t value, then calculate the global atmospheric light value through the mean of the RGB values of the original image corresponding to the first 0.1% of the values in the medium transmission map, and use the existing atmospheric scattering model to complete the image defogging operation.

The main contributions of the article:
1. An end-to-end network is built through a special network design, and the original foggy image is used to directly fit the corresponding transmittance image t, thereby achieving the purpose of defogging.
2. Proposed a nonlinear bilateral rectified linear activation function module (BReLu) and verified its effectiveness (reducing the search space and accelerating model convergence)
3. Established the connection between DehazeNet and existing hypothesis priors, and explained how to optimize and improve previous prior hypotheses through automatic learning

2. Introduction to existing models

2.1 Atmospheric scattering model

As can be seen from the above figure, sunlight forms reflected light J(x) on the surface of the object. The reflected light is scattered during the process of passing through the haze, and only part of the energy J(x)t(x) can reach the camera. At the same time, sunlight is also scattered on the surface of suspended particles to form atmospheric light α, which is received by the camera. Therefore, the imaging I(x) in the camera can be composed of two parts,the transmitted object brightness J(x)t(x) and thescattered atmospheric illumination α(1-t(x) ).

The atmospheric scattering model is the core formula for image dehazing. From the above description, it can be seen that the image finally presented to the lens contains two parts:The brightness of the transmitted object J(x)t(x)and Scattered atmospheric illumination α(1-t(x))The specific formula is as follows:

2.2 Existing prior models

(1) Dark channel prior

Refer to the previous article: Single Image Haze Removal Using Dark Channel Prior (dark channel prior)
(2) Maximum contrast method
Haze will reduce the contrast of object imaging: Σx‖ΔI(x)‖=tΣx‖ΔJ(x)‖≤Σx‖ΔJ(x)‖. Therefore, based on this inference, local contrast can be used to approximately estimate the concentration of haze. At the same time, the color and visibility of the image can also be restored by maximizing local contrast.

(3) Color attenuation prior
Color attenuation prior (CAP) is a prior feature similar to dark channel prior (DCP). It has been observed that haze will simultaneously cause a decrease in image saturation and an increase in brightness, which overall manifests as color attenuation. Based on the color attenuation prior, the difference between brightness and saturation is applied to estimate the concentration of haze

(4) Hue parallax

3. Overall overview of DehazeNet

The overall architecture of the model contains four major blocks, namely Feature extraction, which is mainly through convolution + maxout activation function; Multi-scale feature mapping, which mainly draws on the xception network and uses Three different sizes of convolutions 3, 5, and 7 are used to obtain the characteristics of different receptive fields and perform channel splicing; Local extreme values, use maxpooling to complete the solution of extreme values; Nonlinear regression strong>, using a 6*6 convolution + BReLU to complete nonlinear mapping.

3.1 Feature extraction

Traditional image feature extraction requires certain assumptions and priors, but these hand-designed or assumed features are not applicable in all situations. After observing the implementation principles of dark channel prior, maximum contrast prior, and color attenuation prior, the author proposed that the corresponding convolution kernel can be used to deconvolve the original image and add the feature equivalent after nonlinear activation to replace the above a priori characteristics. The maxout activation function used here is mainly to segment the channels, extract the maximum values of different channels, and combine it with different convolution kernels to obtain effects similar to dark channel prior, maximum contrast prior, color attenuation prior, etc., and Can bring stronger fitting performance.

The convolution here uses a 5*5 convolution kernel. The input is 3 channels. The output is the width of the original input image (input data size 16*16). The activation function is maxout. This activation function is different from the ordinary activation function. Very similar, the following is a comparison

For ordinary activation functions, z=w*x + b, out=f(z), where f is our commonly used activation function, such as ReLU, Sigmod, etc.

The difference with maxout is that if we set the number of maxout groups K to 5, there will be an extra layer in front of each neuron. This layer has 5 neurons, as shown in the figure below.

Then the calculation formula of maxout becomes:
z1=w1x + b1
z2=w2x + b2
z3=w3x + b3
z4=w4x + b4
z5=w5*x + b5
The final output is out = max(z1,z2,z3,z4,z5)

The difference is that: the conventional activation function is mainly aimed at a single neuron, such as ReLU, and the output is max(0,x); while for an activation function like maxout, it is Multiple neurons are grouped into a group and the overall maximum value is obtained. maxout has better fitting effect than other activations

3.2 Multi-scale mapping

Multi-scale features are widely used in image defogging because it can obtain scale invariance, making the acquired features more robust. Inspired by the inception module, the author uses 3*3, 5*5 and 7*7 convolutions The kernel is used to obtain different features and aggregate them to enhance the generalization ability of features to multi-scale objects.

3.3 Local extreme value

The reason for using local extreme values is similar to the assumption of constant local transmittance in other prior algorithms. In this article, maxpool is used to replace the previous local transmittance constant prior operation. Here, a 7*7 local area is used.

3.4 Nonlinear regression

In the field of image restoration, the output value of the last layer generally has an upper and lower limit within a small range. Neither sigmoid nor relu is suitable as an activation function. The former is due to the gradient explosion problem, and the latter is not suitable due to no upper limit. This article proposes to use The BReLU activation function limits the value after activation to be between tmin and tmax. It is actually somewhat similar to the Hardtanh function. The author also inherited this function when implementing it.

3.5 Network design structure

Existing problems and subsequent improvements in 3.6

1. The global atmospheric light value A cannot be treated as a constant. It can be included in a unified network for learning in the same way as the transmittance conversion parameter t.
2. The atmospheric scattering model can also be directly learned through the model, completely abandoning manual design.

4. Reference source code

https://github.com/zlinker/DehazeNet/blob/master/DehazeNet.py
https://github.com/thuBingo/DehazeNet_Pytorch
Deep Learning (23) Maxout Network Learning

The following code implements the network structure of DehazeNet. It is generally relatively simple. What needs attention is the implementation of maxout and BReLU.

class BRelu(nn.Hardtanh):
    def __init__(self, inplace=False):
        super(BRelu, self).__init__(0., 1., inplace)

    def extra_repr(self):
        inplace_str = 'inplace=True' if self.inplace else ''
        return inplace_str

class DehazeNet(nn.Module):
    def __init__(self, input=16, groups=4):
        super(DehazeNet, self).__init__()
        self.input = input
        self.groups = groups
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.input, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=4, out_channels=16, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(in_channels=4, out_channels=16, kernel_size=5, padding=2)
        self.conv4 = nn.Conv2d(in_channels=4, out_channels=16, kernel_size=7, padding=3)
        self.maxpool = nn.MaxPool2d(kernel_size=7, stride=1)
        self.conv5 = nn.Conv2d(in_channels=48, out_channels=1, kernel_size=6)
        self.brelu = BRelu()
        for name, m in self.named_modules():
            if isinstance(m, nn.Conv2d):
                nn.init.normal(m.weight, mean=0, std=0.001)
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)

    def Maxout(self, x, groups):
        x = x.reshape(x.shape[0], groups, x.shape[1] // groups, x.shape[2], x.shape[3])
        x, y = torch.max(x, dim=2, keepdim=True)
        out = x.reshape(x.shape[0], -1, x.shape[3], x.shape[4])
        return out

    def forward(self, x):
        out = self.conv1(x)
        out = self.Maxout(out, self.groups)
        out1 = self.conv2(out)
        out2 = self.conv3(out)
        out3 = self.conv4(out)
        y = torch.cat((out1, out2, out3), dim=1)
        y = self.maxpool(y)
        y = self.conv5(y)
        y = self.brelu(y)
        y = y.reshape(y.shape[0], -1)
        return y

Tips for generating data sets in this article:

It can be seen from the above literature that: using a haze-free picture J(x), randomly setting the transmittance parameter t, and setting the global atmospheric light A to 1, the hazy picture can be obtained directly, and the corresponding transmittance parameter can also be obtained as gt value obtained during training.