Imbalanced dataset? Not enough defect data? Using the open source unsupervised defect detection library Anomalib

The importance of quality control and quality assurance is self-evident. Therefore, defect detection has a large demand in many industries and plays an extremely important role. For example, in manufacturing, by detecting anomalies on the production line, companies can ensure that only the highest quality products leave the factory. In the medical industry, early detection of abnormalities through medical imaging helps doctors to accurately diagnose patients.

Any mistakes in the above scenarios can lead to serious consequences. Because of this, many industries have begun to say goodbye to manual inspection and maintenance, which is prone to errors due to subjective factors, and instead introduce ever-changing computer vision and deep learning technologies to implement automated anomaly detection.

To truly enhance quality control and quality assurance, AI must leverage data-rich and balanced datasets. While there are plenty of good data samples available today, sometimes the scarcity of flawed data makes it difficult to help industrial and medical sectors make accurate and valid predictions.

Overcoming dataset challenges

Supervised learning based methods utilize enough annotated anomaly samples, which can usually be used to achieve satisfactory anomaly detection results. But what happens if the dataset is an imbalanced dataset that lacks representative samples of the outlier class? How do you define the boundaries of anomalies when defects can be of any type of shape?

One approach to address these issues is unsupervised anomaly detection, which requires little to no annotations. Unsupervised anomaly detection relies entirely on normal samples during the training phase, which can identify anomalous samples by comparing with the learned normal data distribution.

The open source end-to-end anomaly detection library Anomalib is an open source library based on unsupervised anomaly detection algorithms, which provides advanced anomaly detection algorithms that can be customized according to specific use cases and requirements.

Application of Anomalib in Manufacturing – Training and Deploying Defect Detection Model with Custom Data

Let’s look at an example of a production linewith colored cubes. Some of these cubes will have holes or defects and will need to be removed from the conveyor belt.

For anomaly detection in this scenario, we do not have hardware accelerators available for training models at the edge. We also cannot assume that thousands of images, especially flawed ones, have been collected for edge training. Furthermore, it is not expected that there will be a large number of known defects as in a real manufacturing scenario.

Given these initial conditions, one of our goals is to achieve faster training at the edge with highly accurate and efficient anomaly detection. One thing to keep in mind is that if any external conditions change – like lighting, camera or anomalies, we will have to retrain the model. Therefore, less laborious retraining is necessary. Finally, to ensure that the models are useful in real-world manufacturing use cases, we must guarantee accurate inference results using anomaly detection models.

With the help of the extensive Anomalib library, we can design, implement and deploy unsupervised anomaly detection models, covering the process from data collection to edge application, thus meeting all our requirements.

The source code for all of the following steps is in this getting started notebook. Next, let’s break it down step by step to see what steps are involved, allowing you to use your own custom data set to complete the training and deployment of the unsupervised defect detection model.

Install:

Follow the steps below to install Anomalib from source:

1. Use Python 3.8 to create an environment for running Anomalib + Dobot DLL

  • For Windows, use the following code:
python -m venv anomalib_env
 
anomalib_env\Scripts\activate
  • For Ubuntu:
python3 -m venv anomalib_env
 
source anomalib_env/bin/activate

2. Install Anomalib from the GitHub repository with OpenVINO? requirements (in this blog post, we will not use the pip install command):

python –m pip install –upgrade pip wheel setuptools
 
git clone https://github.com/openvinotoolkit/anomalib.git
 
cd anomalib
 
pip install -e . [openvino]

3. Install Jupyter Lab or Jupyter Notebook:

pip install notebook
 
pip install ipywidgets

4. Then connect your USB camera and use the simple camera app to verify it is working. Then, close the app.

Optional: If you have access to Dobot, follow these steps:

  • Install Dobot requirements (refer to Dobot documentation for more information).
  • Check all connection status of Dobot and use Dobot Studio to verify it is working properly.
  • Install the ventilation accessory on the Dobot and use Dobot Studio to verify that it is working properly.
  • In Dobot Studio, click the “Home” button and find:
    • Calibration coordinates: the initial position of the upper left corner of the cube array.
    • Position coordinates: The position where the robot arm should place the cube above the conveyor belt.
    • Anomaly Coordinates: The location where the anomaly cube is released.
    • These coordinates are then substituted in the notebook. For more instructions on this step, please refer to the readme file.

5. To run the notebook with the robot, download the Dobot API and driver files from here and add them to notebooks/500_uses_cases/dobot in the Anomalib folder of the repository.

Note: If you don’t have a robot, you can go to another notebook, such as 501b notebook, download the dataset through this link, and try training and inference there.

Data Collection and Inference for Notebook:

Next, we need to create folders with normal datasets. In this example, we create a dataset of colored cubes and add a black circle sticker for anomalies to simulate a hole or defect in a box. For data acquisition and inference, we will use 501a notebooks.

When acquiring data, be sure to run the notebook with the acquisition variable set to True , and define a “normal” folder for data without anomalies and “normal” for abnormal images. em>Exception” folder. The dataset will be created directly in the Anomalib cloned folder, so we will see the Anomalib/dataset/cubes folder.

If you don’t have a robot, you can modify the code to save images or use a downloaded dataset for training.

Inference:

For inference, the acquisition variable should be False, we won’t save any images. We will read the captured video frames, run inference using OpenVINO?, and decide where to place the cubes: on the conveyor belt for normal cubes and outside the conveyor belt for abnormal cubes.

We need to recognize the collection flags – True for collection mode and False for inference mode. In acquisition mode, note whether to create normal or abnormal folders. For example, in acquisition mode, the notebook will save each image in anomalib/datasets/cubes/{FOLDER} for further training. In inference mode, the notebook will not save the image; it will run inference and display the results.

Training:

For training, we will use the 501b notebook. In this notebook, we will use PyTorch Lighting and use the “Padim” model for training. This model has several advantages: we don’t need a GPU, the training process can be done with only the CPU, and the training speed is also very fast.

Now, let’s take a deeper look at the training notebook!

  • Import

In this section, we explain the packages used for this example. We will also call the packages we need to use from the Anomalib library.

  • Configuration:

There are two ways to configure Anomalib modules, one is using a configuration file and the other is using an API. The easiest way is to see what the library does through the API. If you want to implement Anomalib in your production system, use the configuration file (YAML file), which is the core training and testing process, including dataset, model, experiment and callback management.

In the next sections, we describe how to configure your training using the API.

  • Dataset Manager:

Through the API, we can modify the dataset module. We will prepare dataset path, format, image size, batch size and task type. We then load the data into the pipeline using the following code.

i, data = next(enumerate(datamodule.val_dataloader()))
  • Model Manager:

For the anomaly detection model we use Padim, you can also use other Anomalib models such as: CFlow, CS-Flow, DFKDE, DFM, DRAEM, FastFlow, Ganomaly Patchcore, Reverse Distillation and STFPM. Additionally, we set up a model manager using the API; use anomalib.models to import Padim.

  • Callbacks Manager:

To properly train the model, we need to add some other “non-base” logic like saving weights, terminating early, benchmarking against anomaly scores, and visualizing input/output images. To achieve this, we use Callbacks. Anomalib has its own Callbacks and supports PyTorch Lightning’s native callbacks. With this code, we will create a list of callbacks that are executed during training.

  • Training:

After setting up the data module, model and callbacks, we can train the model. The final component required to train a model is the pytorch_lightning Trainer object, which handles the training, testing, and prediction pipelines. Click here to see an example of the Trainer object in the notebook.

  • Verify:

We use OpenVINO inference for verification. In the previous import section, we imported OpenVINOInferencer from the anomalib.deploy module. Now, we will use it to run inference and check the results. First, we need to check if the OpenVINO model is in the results folder.

  • Forecast results:

To perform inference, we need to call the predict method from OpenVINOinference (where we can set the OpenVINO model and its metadata) and identify the device to use:

predictions = inferencer. predict(image=image)

Prediction contains various information related to the result: original image, prediction score, anomaly map, heatmap image, prediction mask and segmentation result. Depending on the type of task you want to choose, you may need more information.

In the end, our defect detection use case using the Dobot robot is basically like this.

Tips and advice for using your own dataset

Dataset conversion:

If you want to improve the accuracy of your model, you can apply data transformations in your training pipeline. You should provide the path to the boost configuration file in the dataset.transform_config section of config.yaml. This means you need to have a config.yaml file for Anomalib settings, and a separate albumentations_config.yaml file that can be used by the Anomalib config yaml file.

In this discussion thread, you can learn how to add data transformations to your actual training pipeline.

Powerful models:

Anomaly detection libraries are not omnipotent and may fail when confronted with troublesome datasets. The good news: you can try 13 different models and benchmark the results of each experiment. You can use benchmark entry point scripts for it and configuration files for benchmarking purposes. This will help you choose the best model for your actual use case.

For more guidance, check out the How To Guide.

Next

If you are using Dobot and would like to see more articles exploring its usage scenarios through this notebook, please add your comments or questions in the comments below. If you encounter any problems or bugs during the Anomalib installation, please file them in our GitHub repository.

We look forward to seeing how Anomalib is used in more scenarios, and welcome everyone to discuss and share.