How strong is GPT-4V in anomaly detection? The latest evaluation of Hua University of Science and Technology is here!

Click the Card below and follow the “CVer” public account

AI/CV heavy-duty information, delivered as soon as possible

Click to enter->[Anomaly Detection and Defect Detection] Communication Group

Reply in the background of the CVer WeChat public account: GPT anomaly detection, you can download the evaluation report pdf, start learning quickly!

Scan the QR code to join CVer Knowledge Planet, you can quickly learn the paper ideas from the latest top conferences and journalsand CV from entry to Proficient in information, as well as cutting-edge projects and applications! Publish a paper, highly recommended!

1e1273fde34ff79a7a5ee41823f46751.jpeg

Reprinted from: Heart of the Machine

The anomaly detection task aims to identify outliers that deviate significantly from the normal data distribution and plays an important role in many fields such as industrial inspection, medical diagnosis, video surveillance, and fraud detection. Traditional anomaly detection methods mainly rely on describing normal data distribution to distinguish positive anomaly samples. However, for practical applications, anomaly detection also requires understanding the high-level semantics of the data to gain a deep understanding of “what an anomaly is.”

To achieve more accurate and intelligent anomaly detection, we need to focus on the following key steps:

1. Understand various data types and categories

Datasets in different fields contain various data types and categories, such as images, videos, point clouds, time series, etc. Each data type may require a different anomaly detection method, and each object category may correspond to different normal standards, so a deep understanding of the diversity of the data is crucial.

2. Determine normal status standards

Once we understand the types and categories of data, we need to infer criteria for normality. This requires the understanding of high-level data semantic information to ensure that we can correctly identify the characteristics and patterns of normal data.

3. Evaluate the consistency of the data

Finally, we need to evaluate whether the data provided conforms to the established normal data distribution. Any deviation from these data distributions can be classified as an anomaly.

Recently, large-scale multi-modal models (LMM) have developed rapidly. Among them, GPT-4V (ision) recently launched by OpenAI has the best performance. It has powerful multi-modal perception capabilities and has achieved results in multiple tasks such as scene understanding and image generation. performed well. We believe that the emergence of LMM provides a new paradigm and new opportunities for the research of general anomaly detection.

In order to evaluate the performance of GPT-4V in general anomaly detection, researchers from Huazhong University of Science and Technology, University of Michigan and University of Toronto jointly conducted a study on 15 anomalies involving 4 data modalities and 9 anomaly detection tasks. GPT-4V was comprehensively tested on the detection data set. Specifically, the tested data sets include images, point clouds, videos, time series and other modalities, and cover industrial image anomaly detection/positioning, medical image anomaly detection/positioning, point cloud anomaly detection, logical anomaly detection, and pedestrian anomaly detection. , traffic anomaly detection, timing anomaly detection and other 9 anomaly detection tasks.

920dafc34883b32457c6dd448dc528d7.png

  • Paper address: https://arxiv.org/pdf/2311.02782.pdf

  • Project address: https://github.com/caoyunkang/GPT4V-for-Generic-Anomaly-Detection

b7a39194bc41194176ac44887e695768.png

Observation and Analysis

This paper tests the performance of GPT4V on anomaly detection data sets in multiple modalities and domains. We believe that GPT4V has initially possessed multi-modal universal anomaly detection capabilities. Specifically, GPT-4V can not only effectively understand diverse data types and categories, but also model the spatial distribution of normal data and evaluate the distribution of test data.

In addition, GPT-4V also has the following characteristics in anomaly detection tasks:

GPT-4V can handle multi-modal and multi-domain anomaly detection tasks with zero/single samples

Multi-modal anomaly detection: GPT-4V can effectively handle anomaly detection tasks in multiple modal data. For example, it has demonstrated excellent anomaly detection capabilities in identifying data modalities such as images, point clouds, MRI, and X-ray. Multi-modal anomaly detection capabilities enable GPT-4V to break through the limitations of traditional single-modal anomaly detectors and complete complex anomaly detection tasks in the real world.

Multi-domain anomaly detection: GPT-4V has excellent performance in multiple fields such as industrial, medical, pedestrian, traffic and time series anomaly detection.

Anomaly detection under zero/single sample: GPT-4V performs well in both zero-sample and single-sample (that is, a normal reference image is provided) tasks. In the absence of reference images, GPT-4V can effectively use language hints to detect anomalies. When a normal reference image is provided, GPT-4V is able to better align the normal standard of text format with normal image content, and its anomaly detection accuracy is further improved.

GPT-4V can understand the global and fine-grained semantics required for anomaly detection tasks

Global semantic understanding ability: GPT-4V’s ability to understand global semantics is reflected in its ability to identify overall abnormal patterns or behaviors. For example, in traffic anomaly detection, it can distinguish the difference between normal traffic flow and irregular events, and provide detailed explanations about anomaly detection. This global understanding makes it ideal for identifying outliers that deviate from normal distributions in the open world.

Fine-grained semantic understanding ability: GPT-4V’s ability to understand fine-grained semantics performs well in some cases, allowing it to not only detect anomalies, but also accurately locate anomalies in complex data. For example, in industrial image anomaly detection, it can accurately locate details such as tilted candle wicks and slight scratches around the mouth of a bottle. This fine-grained understanding enhances its ability to detect small anomalies in complex data, thereby improving its overall detection.

GPT-4V has the ability to automatically infer anomaly detection

GPT-4V can automatically reason and split subtasks based on complex normal criteria. For example, in logical anomaly detection, GPT-4V can understand the given normal image standard and split it into subtasks to sequentially check whether the image content meets the specified content. This inherent reasoning ability enhances the interpretability of its anomaly detection results, making it an effective tool for understanding and solving general anomaly detection.

GPT-4V can further enhance anomaly detection capabilities by adding hints

The evaluation results show that providing more text and image information has a positive impact on the anomaly detection performance of GPT-4V. By adding category information, human expertise, and reference images, the model obtains more contextual information, and the anomaly detection performance is significantly improved. This feature allows users to fine-tune and enhance model performance by providing relevant supplementary information.

GPT-4V may be limited in practical applications, but still has potential

This report finds that GPT-4V still faces some challenges in practical applications. For example, GPT-4V may face difficulties in handling complex scenarios in industrial applications, causing it to experience false detections. Ethical constraints in the medical field also make it conservative when judging abnormal conditions such as tumors. But we believe it still has potential in various anomaly detection tasks. To effectively address these challenges, further enhancements, specialized fine-tuning, or complementary technologies may be required. In summary, GPT-4V has obvious potential in general anomaly detection and is expected to usher in an era of high-level perception for anomaly detection tasks.

Application scenario display

Industrial image anomaly detection

Industrial image anomaly detection aims to maintain product quality and is an important part of the manufacturing process. In recent years, many approaches have flourished in this area,some of which focus on developing unified models applicable to any product category. This study explores the use of GPT-4V for anomaly detection in industrial images, including testing different types of information and demonstrating its performance and limitations.

We selected several examples from industrial images, such as images of bottles and candles. GPT-4V was able to effectively identify anomalies in these images, even when provided with only simple verbal cues, demonstrating its power and versatility. Furthermore, GPT-4V is capable of not only detecting desired anomalies but also identifying microstructural abnormalities. In complex situations, such as anomaly detection in circuit boards, GPT-4V is able to identify details in images, but it also has certain limitations. Overall, GPT-4V performs well in image context understanding and category-specific anomaly understanding.

be8a10750fd6d2ea0e11b6a5fbc73041.png

1db73d51c9fe8d4729943151e94d75b9.png

Industrial image anomaly location

Unlike industrial image anomaly detection, industrial image anomaly localization aims to accurately identify the location of anomalies. To achieve this goal, we adopt a similar approach to SoM (Set-of-mark), using image-mask pairs to cue GPT-4V. We study the performance of GPT-4V in different scenarios, demonstrating its capabilities and limitations in fine-grained anomaly localization.

We demonstrate the performance of GPT-4V in localizing anomalies in industrial images, including locating bent wires, holes in nuts, and identifying circuit board anomalies. GPT-4V can accurately identify abnormal locations in some cases, such as effectively locating holes in nuts, and due to the combination of visual prompt technology, GPT-4V transforms the abnormal location problem into a mask classification problem, effectively reducing the problem complexity and improved positioning accuracy. Therefore, combining visual prompt technology and GPT-4V can effectively solve the problem of industrial image anomaly localization.

35795d138203d6a858068739844757ed.png

Point cloud anomaly detection

Point cloud anomaly detection plays an important role in the industrial field. CPMF proposes a new method to convert point clouds into depth images to leverage image base models to improve the performance of point cloud anomaly detection. We use CPMF to convert point clouds into depth images, allowing GPT-4V to handle point cloud anomaly detection tasks.

We demonstrate the performance of GPT-4V in point cloud anomaly detection, including identifying small protrusions in pocket circles, detecting anomalies on ropes, and finding anomalies in artifacts. GPT-4V is effective at identifying these anomalies, but has limitations in some cases, especially when rendering is of lower quality. Overall, GPT-4V shows potential in point cloud anomaly detection.

f5dede4c2eb158a9ff849121a5fa9869.png

73d4bbe0b313f312da13c24d3218f258.png

Logical anomaly detection

The logical anomaly detection task is proposed on the MVTec LOCO dataset. This task usually arises during assembly and requires identifying whether the individual components fit together correctly. Existing logical anomaly detection methods usually rely on visual global-local correspondence, but do not inherently understand the image content. We study the application of GPT-4V to logical anomaly detection and explore its ability to understand image content.

We demonstrate the performance of GPT-4V in logical anomaly detection, including identifying complex logical rules, detecting logical anomalies, and providing detailed explanations. Although GPT-4V can accurately identify logical anomalies in most cases, it has certain limitations in some complex cases, especially for detailed problems. However, combining multiple rounds of dialogue with specific language prompts is expected to significantly improve the performance of GPT-4V in these situations.

c42fcd4b2a4a7e4a0ca0680c70675e35.png

Medical image anomaly detection

Medical image anomaly detection is a key task in the field of medical imaging and aims to identify outliers that do not conform to the expected data distribution. We study the application of GPT-4V to anomaly detection in medical images, including medical images of different diseases and imaging modalities. We tested the generalization ability of GPT-4V and revealed its performance and limitations in anomaly detection in medical images.

We demonstrate the performance of GPT-4V in medical image anomaly detection, including identifying abnormal images across different diseases and imaging modalities. Even with only simple language prompts, GPT-4V is able to effectively identify anomalies and provide detailed explanations. Furthermore, introducing more information, such as disease information and expertise, can further improve the performance of GPT-4V. However, GPT-4V may produce false abnormality detection in some cases, so the final judgment of the physician is still required.

71b8101b5aaf62e1e7678e2c478cba13.png

43ce3056fcb923f0d014f7e1b422d92c.png

Medical image abnormality localization

After detecting medical abnormalities, it is necessary to further accurately locate the abnormalities present in the medical images, such as lesions. Accurate localization of abnormalities in medical images can effectively help clinicians understand the extent and nature of pathology. However, it is difficult to directly predict anomaly masks using GPT-4V in real-world medical image anomaly localization tasks. Inspired by the SoM, we wanted to test the GPT-4V model’s ability to localize anomalies under visual cues.

Combined with SoM, we demarcate possible abnormal locations in medical images. Guided by visual cues in images, GPT-4V tends to learn and describe the region around a marker. For cases that are easy to identify and localize, GPT-4V can clearly distinguish abnormal areas from the background. However, in one case of artificially synthesized anomalies, GPT4V’s judgment was biased because the area of interest had similar texture and shape to the background. This shows that the model still needs to enhance its detection and localization capabilities under adversarial attacks and complex backgrounds.

6d743d0cf76304b7f1bc715335903b28.png

Traffic detection

Traffic detection is a key task in the field of urban traffic management and autonomous driving, which aims to monitor traffic conditions and detect traffic violations and dangerous situations. We study the application of GPT-4V in traffic detection, including vehicle recognition, traffic sign recognition, and traffic violation detection. We tested the performance of GPT-4V in different scenarios, demonstrating its potential and limitations.

We demonstrate the performance of GPT-4V in traffic detection, including identifying different types of vehicles, detecting various traffic signs, and identifying traffic violations. GPT-4V is able to handle these tasks efficiently, especially in canonical scenarios. However, in complex traffic environments, performance may degrade as it requires understanding and interpreting complex situations.

3fc4b6d5983f393d4f494a7b4375a48b.png

Pedestrian Detection

Pedestrian detection is a key task in areas such as autonomous driving, security surveillance, and smart cities, and it aims to identify pedestrians in images or videos. We studied the application of GPT-4V in pedestrian detection and tested its pedestrian recognition capabilities and performance.

We demonstrate the performance of GPT-4V in pedestrian detection, including the ability to detect pedestrians in different backgrounds. GPT-4V is generally able to recognize pedestrians, but can make errors in complex backgrounds. The performance may be relatively poor compared to specialized pedestrian detection models, but its advantage lies in its ability to provide more linguistic interpretations.

9c9cdd51466b58a350ff6ebb1c53f0b4.png

Timing detection

Time series detection is an anomaly detection task involving time series data, such as sensor data, financial time series, etc. We studied the application of GPT-4V to time series detection, testing its ability to analyze and detect anomalies in time series.

We demonstrate the performance of GPT-4V in timing detection, including detecting anomalies in sensor data, anomalies in financial transaction data, etc. GPT-4V excels at analyzing time series data and is able to identify different types of anomalies. However, it is important to note that timing detection often requires more domain expertise, and GPT-4V may need to incorporate expert advice in these cases.

dab09374a6ba7c1c7a35db38067a0283.png

Conclusion

GPT-4V has demonstrated outstanding potential in the fields of industrial image anomaly detection, industrial image anomaly localization, point cloud anomaly detection, logical anomaly detection, medical image anomaly detection, traffic detection, pedestrian detection and timing detection. It understands multi-modal data, provides efficient understanding of image content, and accurately detects and explains anomalies in many situations. However, in complex scenes, GPT-4V still has certain limitations in its anomaly detection capabilities. Taken together, GPT-4V provides a new research paradigm for general anomaly detection, but its practical application still requires further research and improvement.

CVPR/ICCV 2023 paper and code download


Backend reply: CVPR2023, you can download the CVPR 2023 papers and code open source paper collection

Backend reply: ICCV2023, you can download the collection of ICCV 2023 papers and code open source papers
Anomaly detection and defect detection exchange group established
Scan the QR code below, or add WeChat: CVer444, to add CVer Assistant WeChat, and then apply to join the CVer-Anomaly Detection or Defect Detection WeChat communication group. In addition, other vertical directions have been covered: target detection, image segmentation, target tracking, face detection & recognition, OCR, pose estimation, super-resolution, SLAM, medical imaging, Re-ID, GAN, NAS, depth estimation, automatic Driving, reinforcement learning, lane detection, model pruning & compression, denoising, fog removal, rain removal, style transfer, remote sensing images, behavior recognition, video understanding, image fusion, image retrieval, paper submission & communication , PyTorch, TensorFlow and Transformer, NeRF, etc.
Be sure to note: Research direction + location + school/company + nickname (such as anomaly detection or defect detection + Shanghai + hand-in + Kaka). Note according to the format to be passed faster and invited to the group


▲Scan the QR code or add WeChat ID: CVer444 to join the communication group
CVer Computer Vision (Knowledge Planet) is here! If you want to know about the latest, fastest and best CV/DL/AI paper express delivery, high-quality practical projects, AI industry cutting-edge, and learning tutorials from entry to mastery, please scan the QR code below and join CVer Computer Vision (Knowledge Planet). Nearly ten thousand people have been gathered!

▲Scan the QR code to join Planet Learning

▲Click on the card above to follow the CVer official account

It’s not easy to organize, please like and watchca7b09649e1247a2d6432849d258945b.gif