Industrial anomaly detection: from cutting-edge to implementation

Foreword

Industrial anomaly detection is a hot topic. From traditional image processing to deep visual recognition that introduces deep models, great improvements have been made in just a dozen years. This improvement is mainly reflected in several aspects: 1. The detection capabilities are becoming more and more powerful, from single anomaly detection to now multiple types of detection; 2. The models are becoming more and more powerful, from a few file pictures to the tens of thousands of files required by the current deep model. level image. But abnormal images are always difficult to obtain. With the support of edge GPU, the depth model has achieved a double balance of detection speed and detection accuracy. This article will be expanded from two parts: 1. Principle interpretation, 2. Hands-on practice. Adhere to the unity of knowledge and action, strive to do some practical work, and bring a sense of comfort to obsessive-compulsive disorders like me.

1. Cutting-edge technology: SimpleNet

Actual measurement renderings on the industrial anomaly data set MVTec AD

Figure 1 Measured renderings on the industrial anomaly data set MVTec AD

1. Task classification: image segmentation

We can see that this technology provides the following functions: 1. The abnormal area is superimposed on the original image; 2. The heat map of the abnormal area is displayed, and orange indicates the abnormal area with a high score; 3. The boundary of the abnormal area is demarcated in red. Therefore, we can classify the task of industrial anomaly detection into image segmentation. If it is possible to further detect which anomaly type it is, rather than simply 0-1 classification, it can be divided into specific instance segmentation tasks.

2. Performance comparison: Faster & amp; Higher

Figure 2 Performance comparison on the detection speed and detection accuracy panel

From Figure 2, we can easily find that this method achieves a segmentation accuracy of more than 99.5 (I-AUROC) on the premise of achieving close to 80FPS (frames per second). Is this fast and good detector a tool to solve industry needs? Can it stand up to actual combat?

However, there is a problem to be solved: the above method only gives the detected abnormal location, but what kind of abnormality does it belong to, is it an abnormality on steel parts or on silk? This method is not given. If an anomaly in a certain material is uncertain, can it still be detected? Therefore, at least two issues require further research: 1. Instance segmentation; 2. Migration segmentation or robust detection.

3. Research background

Difficulty:

In industrial scenarios, anomaly detection and location are particularly difficult because there are few abnormal samples and there are many types of anomalies, such as small scratches to large structural defects. See Figure 1 for more examples. This makes supervised training more difficult to perform.

Scheme: Unsupervised

In order to solve this problem, the current method mainly adopts an unsupervised method, that is, only normal samples are used for training, and abnormal samples are only added during the testing process. There are three trends in the more common unsupervised methods:

Reconstruction method; it assumes that the deep network cannot accurately reconstruct anomalies when it is trained with only normal data, and pixels with incorrect reconstructions are regarded as anomalies. However, this assumption cannot always be true, because sometimes the network generalizes well and can often reconstruct abnormal pixels, leading to false detections.
Synthetic method; it trains the network by generating anomalies on abnormal-free data, and then estimates the boundaries of the abnormal areas. However, if the synthesized images are not realistic enough, the generated features may deviate very far from the real features, and using such negative samples for training may lead to a loose bounded normal feature space.
Embedding methods; these methods currently achieve SOTA performance. It generally uses a model pre-trained on ImageNet to extract common normal features, and then uses statistical algorithms such as multi-parameter Gaussian distribution, normalized flow and memory bank to embed the normal feature distribution. Anomalies are detected by comparing input features to learned distributions or memorized features. However, industrial anomaly images often have different distributions than ImageNet. Direct use of these biased features may lead to mismatch problems. Moreover, these statistical algorithms require high computational complexity or high memory consumption.

In order to solve the above problems, the author proposes a SimpleNet, which can take advantage of the synthesis and embedding methods to achieve the following improvements:

Instead of using pre-trained features directly, the authors propose using a feature adapter to generate target-specific features to reduce domain bias.
Instead of directly synthesizing abnormal samples, the authors propose to generate abnormal features by placing noise into normal features in the feature space. The authors believe that by properly calibrating the scale of the noise, a tightly bounded normal feature space can be produced.
The anomaly detection process can be simplified by training a simple discriminator, which is more computationally efficient than using complex statistical algorithms in embedding methods.
Specifically, SimpleNet uses pre-trained skeletons to extract normal features and then perform discrimination. The structure of the discriminator is very simple and consists only of MLP. The author’s framework looks like this:

Figure 3 SimpleNet framework

Dataset: MVTec AD

It is a very popular dataset for anomaly detection and localization. It contains 5 textures, 10 object categories, and a total of 5354 images. This data set usually consists of a training set of normal samples and a test set of both abnormal and normal samples. Moreover, it also provides pixel-level annotation for abnormal image testing. In the author’s article, it treats all categories as one category, which is the so-called cold start anomaly test, that is, the author trains a single classifier for each category on the corresponding normal training samples. No data augmentation is used, each image is normalized to 256×256 and center cut to 224×224.

Performance comparison:

The figure below shows the comparative performance of leading methods on the current dataset:

Figure 4 Performance comparison between SimpleNet and current leading methods

Speed comparison:

The author ran almost 8x faster performance than PatchCore on a Nvidia GeForce GTX 3080ti GPU and an Intel? and Xeon? CPU E5-2680 [email protected].

Figure 5 More segmentation effects

2. Code practice

Follow the github instructions, it costs almost nothing and you can run it in no time.

1. Operation record:

(pytorch-cifar) wqt@ser2024:SimpleNet$ bash run.sh
INFO:__main__:Command line arguments: main.py --gpu 0 --seed 0 --log_group simplenet_mvtec --log_project MVTecAD_Results --results_path results --run_name run net -b wideresnet50 -le layer2 -le layer3 --pretrain_embed_dimension 1536 - -target_embed_dimension 1536 --patchsize 3 --meta_epochs 40 --embedding_size 256 --gan_epochs 4 --noise_std 0.015 --dsc_hidden 1024 --dsc_layers 2 --dsc_margin .5 --pre_proj 1 dataset --batch_size 4 --resize 329 - -imagesize 288 -d screw -d pill -d capsule -d carpet -d grid -d tile -d wood -d zipper -d cable -d toothbrush -d transistor -d metal_nut -d bottle -d hazelnut -d leather mvtec / home/wqt/Datasets/MVTecAD/
INFO:__main__:Dataset: train=320 test=160
INFO:__main__:Dataset: train=267 test=167
INFO:__main__:Dataset: train=219 test=132
INFO:__main__:Dataset: train=280 test=117
INFO:__main__:Dataset: train=264 test=78
INFO:__main__:Dataset: train=230 test=117
INFO:__main__:Dataset: train=247 test=79
INFO:__main__:Dataset: train=240 test=151
INFO:__main__:Dataset: train=224 test=150
INFO:__main__:Dataset: train=60 test=42
INFO:__main__:Dataset: train=213 test=100
INFO:__main__:Dataset: train=220 test=115
INFO:__main__:Dataset: train=209 test=83
INFO:__main__:Dataset: train=391 test=110
INFO:__main__:Dataset: train=245 test=124
INFO:__main__:Evaluating dataset [mvtec_screw] (1/15)...
INFO:__main__:Training models (1/1)
INFO:simplenet:Training discriminator...
epoch: 3 loss: 0.25116 lr: 0.0002 p_true: 0.645 p_fake: 0.668: 100%| 4/4 [00:49<00:00, 12.31s/it]
----- 0 I-AUROC:0.7534(MAX:0.7534) P-AUROC0.9586(MAX:0.9586) ----- PRO-AUROC0.845(MAX:0.845) -----
INFO:simplenet:Training discriminator...

It would be better if prediction code or inference speed could be provided, but judging from the project, it seems that some functions are provided, but due to time constraints, I did not dig deeper. But judging from this project, how practical and robust it is is worth exploring.

Summary

It mainly explains the principles and effects of SimpleNet. Through interpretation, we know the current mainstream data sets and tasks, which provides us with a direction for subsequent detection.

Directory