Developer practice | Deploying RT-DETR model based on OpenVINO? Python API

Click on the blue text

Author: Yan Guojin, Intel Edge Computing Innovation Ambassador

RT-DETR is an improvement based on the DETR model. It is a real-time end-to-end detector based on the DETR architecture. It achieves more efficient training and inference by using a series of new technologies and algorithms. We will use Python , C++, and C# realize OpenVINO? deployment of RT-DETR model to achieve deep learning inference acceleration. In this article, we will first introduce the deployment of RT-DETR model based on OpenVINO? Python API.

All the code used in this project has been open sourced on GitHub and is collected in the OpenVINO?-CSharp-API project. The link to the project directory is:

https://github.com/guojin-yan/OpenVINO-CSharp-API/tree/csharp3.0/tutorial_examples

(Copy the link and open it in your browser)

You can also access the project directly, the project link is:

https://github.com/guojin-yan/RT-DETR-OpenVINO.git

(Copy the link and open it in your browser)

1. RT-DETR

Feipiao launched the high-precision universal target detection model PP-YOLOE in March last year, and in the same year proposed PP-YOLOE+ based on PP-YOLOE. After PP-YOLOE was proposed, models such as MT-YOLOv6, YOLOv7, DAMO-YOLO, and RTMDet were successively proposed, and it has been iterated until YOLOv8 at the beginning of this year.

A major improvement point of the YOLO detector is that it requires NMS post-processing, which is usually difficult to optimize and not robust enough, so there is a delay in the speed of the detector. DETR is a Transformer-based end-to-end object detector that does not require NMS post-processing. Baidu Flying Paddle officially launched RT-DETR (Real-Time DEtection TRansformer), a real-time end-to-end detector based on the DETR architecture, which achieves SOTA performance in speed and accuracy.

RT-DETR is improved on the DETR model and achieves more efficient training and inference by using a series of new technologies and algorithms. Specifically, RT-DETR has the following advantages:

1.Better real-time performance:RT-DETR adopts a new attention mechanism that can better capture the relationship between objects and reduce the amount of calculation. In addition, RT-DETR also introduces a time-based attention mechanism to better process video data.

2Higher accuracy:RT-DETR can maintain high detection accuracy while ensuring real-time performance. This is mainly due to a new multi-task learning mechanism introduced by RT-DETR, which can better utilize training data.

3.Easier to train and adjust parameters:RT-DETR adopts a new loss function to enable better training and parameter adjustment. In addition, RT-DETR introduces a new data augmentation technique that enables better utilization of training data.

2. OpenVINO?

The Intel distribution OpenVINO tool suite is developed based on oneAPI, which can accelerate the development of high-performance computer vision and deep learning vision applications. The tool suite is applicable to various Intel platforms from edge to cloud, helping users to develop more accurate images faster. Real world results are deployed into production systems. Through streamlined development workflows, OpenVINO empowers developers to deploy high-performance applications and algorithms in the real world.

OpenVINO? 2023.1 was released on September 18, 2023, and the toolkit brings new capabilities to unlock the full potential of generative artificial intelligence. Generative AI coverage has been expanded, with an enhanced experience through frameworks like PyTorch*, where you can automatically import and transform models. Large Language Models (LLMs) have received improvements in runtime performance and memory optimizations. Models for chatbots, code generation, and more are enabled. OpenVINO? is more portable, higher-performing and can run wherever you need it: at the edge, in the cloud or on-premises.

3. Environment configuration

This project mainly includes the configuration of two environments, one is model download, and the other is model conversion and deployment. In order to better reproduce the project, the main environment configuration is provided:

3.1 Model download environment

paddlepaddle：2.5.1
imageio:2.31.5
imgaug:0.4.0
onnx=1.13.0
opencv-python=4.5.5.64
paddle2onnx: 0.5
paddledet

Swipe left to see more

3.2 Model deployment environment

Numpy: 1.26.0
opencv-python: 4.8.1.78
openvino: 2023.1.0
openvino-telemetry: 2023.2.0
pillow:10.0.1
python：3.10.13

Swipe left to see more

4. Model download and conversion

PaddleDetection provides pre-trained models and model training tutorials, and you can train your own models based on this tutorial. In this project, we deploy the case test based on the pre-trained model. Next, we export the inference model according to the official tutorial.

(Copy the link and open it in your browser)

https://github.com/PaddlePaddle/PaddleDetection

4.1 PaddlePaddle model download

First refer to [PaddleDetection installation documentation]

https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md

(Copy the link and open it in your browser)

Install PaddlePaddle and PaddleDetection. PaddlePaddle must install the latest version before you can export the RT-DETR model. After the installation is complete, download the model via the command line:

cd PaddleDetection
python tools/export_model.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams trt=True --output_dir=output_inference

Swipe left to see more

The following table shows the input and output node information of the exported model:

From this table, we can see that the model has three inputs and two outputs. The input “im_shape” and “scale_factor” node information is mainly that the model integrates some post-processing content. If you are interested in this kind of multi-input I am not used to using models. In the next article, we will explain how to export and deploy a model deployment process without post-processing.

Therefore, the key nodes in this model are the “image” picture data input and the “reshape2_95.tmp_0” output node, where the format of the model output is: [clasid, scores, x, y, w, h].

4.2 IR model conversion

Next we convert the model to IR format, first convert the model to ONNX format:

paddle2onnx --model_dir=./output_inference/rtdetr_r50vd_6x_coco/ --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 16 --save_file rtdetr_r50vd_6x_coco.onnx

Swipe left to see more

Since the exported model has a dynamic shape and the bath_size information is not fixed, we can set the input shape of the model through the OpenVINO? model optimization tool. The command is as follows:

ovc rtdetr_r50vd_6x_coco.onnx –input “image[1,3,640,640], im_shape[1,2], scale_factor[1,2]”

Swipe left to see more

Finally, we can get the simple and honest model “rtdetr_r50vd_6x_coco.xml”

and the “rtdetr_r50vd_6x_coco.bin” file.

5. Python code implementation

5.1 Implementation of model inference process

In the Python code we define an RT-DETR model inference method:

def rtdert_infer(model_path, image_path, device_name, lable_path, postprocess=True):

Swipe left to see more

This method mainly implements the entire process of RT-DETR model inference, including model reading and loading, file reading and preprocessing, model inference, result processing, and result display. The method input is:

model_path: inference model path

image_path: predict image path

device_name: accelerated inference device name

lable_path,: Identify category files

postprocess: Whether the model includes post-processing. In this article we only explain models that include post-processing, so the default is True.

1) Load inference model

This step mainly implements initializing the Core, reading the local model and compiling the model locally. The implementation code is as follows:

ie_core = Core()
model = ie_core.read_model(model=model_path)
compiled_model = ie_core.compile_model(model=model, device_name=device_name)

Swipe left to see more

2) Preprocess image data

This step mainly processes the read local image data. Here we define a RtdetrProcess Class specifically used to process the input and output data of the RT-DETR model. The code implementation is as follows:

image = cv.imread(image_path)
rtdetr_process = RtdetrProcess([640,640],lable_path)
im, im_info= rtdetr_process.preprocess(image)

Swipe left to see more

3) Load inference data and model inference

This step mainly implements the loading of model inference data and model inference. Since the model we predict is a model with its own post-processing, there are three model inputs, namely “im_shape”, “scale_factor” and “image”.

im_shape: represents the input shape of the model, the input here is [640, 640];

scale_factor: represents the scaling ratio of the image, which is the model input/image shape;

image: Represents the data matrix after image normalization, with a shape of [1, 3, 640, 640];

Finally, the model input dictionary is brought into the compiled model for model inference and the inference results are obtained.

inputs = dict()
inputs["image"] = np.array(im).astype('float32')
inputs["scale_factor"] = np.array(im_info['scale_factor']).reshape(1,2).astype('float32')
inputs["im_shape"] = np.array([640.0,640.0]).reshape(1,2).astype('float32')
results = compiled_model(inputs=inputs)

Swipe left to see more

4) Processing inference results

The model inference results that have been obtained in the previous step are finally brought into the post-processing method we defined to obtain the model prediction results.

re = rtdetr_process.postprocess(results[compiled_model.output(0)])
new_image=rtdetr_process.draw_box(image,re)
cv.imshow("result",new_image)
cv.waitKey(0)

Swipe left to see more

5.2 Implementation of model data processing method

1) Define RtdetrProcess

class RtdetrProcess(object):
def __init__(self, target_size, label_path=None, threshold=0.5, interp=cv.INTER_LINEAR):
    self.im_info = dict()
    self.target_size =target_size
    self.interp = interp
    self.threshold = threshold
    if label_path is None:
       self.labels = []
       self.flabel = False
    else:
        self.labels = self.read_lable(label_path=label_path)
        self.flabel = True

Swipe left to see more

2) Input data processing method

def preprocess(self,im):
    assert len(self.target_size) == 2
    assert self.target_size[0] > 0 and self.target_size[1] > 0
    origin_shape = im.shape[:2]
    resize_h, resize_w = self.target_size
    im_scale_y = resize_h / float(origin_shape[0])
    im_scale_x = resize_w / float(origin_shape[1])
    out_im = cv.cvtColor(im,cv.COLOR_BGR2RGB)
    out_im = cv.resize(
        out_im.astype('float32'),
        None,
        None,
        fx=im_scale_x,
        fy=im_scale_y,
        interpolation=self.interp)
    self.im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
    self.im_info['scale_factor'] = np.array([im_scale_y, im_scale_x]).astype('float32')
    scale = 1.0 / 255.0
    out_im *= scale
    out_im = out_im.transpose((2, 0, 1)).copy()
    return np.expand_dims(out_im.astype('float32'),0), self.im_info

Swipe left to see more

3) Prediction result data processing method

def postprocess(self,scores,bboxs=None):
    results = []
    if bboxes is None:
        scores = np.array(scores).astype('float32')
        for l in scores:
            if(l[1]>=self.threshold):
                re = dict()
                re["clsid"]=int(l[0])
                if(self.flabel):
                    re["label"]=self.labels[int(l[0])]
                else:
                    re["label"]=int(l[0])
                re["score"]=l[1]
                bbox=[l[2],l[3],l[4],l[5]]
                re["bbox"]=bbox
                results.append(re)
    else:
        scores = np.array(scores).astype('float32')
        bboxs = np.array(bboxs).astype('float32')
        for s,b in zip(scores,bboxs):
            s = self.sigmoid(s)
            if(np.max(np.array(s)>=self.threshold)):
                ids = np.argmax(np.array(s))
                re = dict()
                re["clsid"]=int(ids)
                if(self.flabel):
                    re["label"]=self.labels[int(ids)]
                else:
                    re["label"]=int(ids)
                re["score"]=s[ids]
                cx=(b[0]*640.0)/self.im_info["scale_factor"][1]
                cy=(b[1]*640.0)/self.im_info["scale_factor"][0]
                w=(b[2]*640.0)/self.im_info["scale_factor"][1]
                h=(b[3]*640.0)/self.im_info["scale_factor"][0]
 
                bbox=[cx-w/2.0,
                        cy-h/2.0,
                        cx + w/2.0,
                        cy + h/2.0]
                re["bbox"]=bbox
                results.append(re)
    return results

Swipe left to see more

6. Display of prediction results

Finally, through the above code, we can finally directly implement the inference deployment of the RT-DETR model. RT-DETR and the training model use the COCO data set. Finally, we can obtain the predicted image results, as shown in the figure:

The above figure shows the RT-DETR model prediction results. At the same time, we print the key information and inference results in the model diagram:

[INFO] This is an RT-DETR model deployment case using Python!
[INFO] Model path: E:\Model\rtdetr_r50vd_6x_coco.onnx
[INFO] Device name: CPU
[INFO] The input path: E:\GitSpace\RT-DETR-OpenVINO\image\000000570688.jpg
[INFO] class_id:0, label:person, confidence:0.9284, left_top:[215.03,327.88],right_bottom:[259.24,469.64]
[INFO] class_id:0, label:person, confidence:0.9232, left_top:[260.34,343.99],right_bottom:[309.42,461.80]
[INFO] class_id:0, label:person, confidence:0.8929, left_top:[402.26,346.80],right_bottom:[451.54,479.55]
[INFO] class_id:33, label:kite, confidence:0.8382, left_top:[323.52,159.82],right_bottom:[465.93,214.78]
[INFO] class_id:0, label:person, confidence:0.8342, left_top:[294.05,384.59],right_bottom:[354.15,443.96]
[INFO] class_id:0, label:person, confidence:0.8284, left_top:[518.88,360.37],right_bottom:[583.88,480.00]
[INFO] class_id:33, label:kite, confidence:0.8281, left_top:[282.11,217.29],right_bottom:[419.96,267.66]
[INFO] class_id:33, label:kite, confidence:0.8043, left_top:[330.01,64.70],right_bottom:[389.58,86.40]
[INFO] class_id:33, label:kite, confidence:0.8016, left_top:[242.46,124.74],right_bottom:[263.87,135.74]
[INFO] class_id:0, label:person, confidence:0.7972, left_top:[456.74,369.06],right_bottom:[508.27,479.42]
[INFO] class_id:33, label:kite, confidence:0.7970, left_top:[504.63,195.20],right_bottom:[523.44,214.82]
[INFO] class_id:33, label:kite, confidence:0.7681, left_top:[460.08,251.92],right_bottom:[479.02,269.19]
[INFO] class_id:33, label:kite, confidence:0.7601, left_top:[116.23,178.53],right_bottom:[137.02,190.61]
[INFO] class_id:0, label:person, confidence:0.7330, left_top:[154.12,380.38],right_bottom:[210.76,421.32]
[INFO] class_id:0, label:person, confidence:0.6998, left_top:[26.77,340.99],right_bottom:[58.48,425.10]
[INFO] class_id:33, label:kite, confidence:0.6895, left_top:[430.29,29.91],right_bottom:[450.06,44.32]
[INFO] class_id:33, label:kite, confidence:0.6739, left_top:[363.20,120.95],right_bottom:[375.84,130.11]
[INFO] class_id:33, label:kite, confidence:0.6130, left_top:[176.50,236.77],right_bottom:[256.62,258.32]
[INFO] class_id:0, label:person, confidence:0.6001, left_top:[497.35,380.34],right_bottom:[529.73,479.49]
[INFO] class_id:33, label:kite, confidence:0.5956, left_top:[97.84,316.90],right_bottom:[156.75,360.25]
[INFO] class_id:33, label:kite, confidence:0.5730, left_top:[221.56,264.66],right_bottom:[342.60,312.92]
[INFO] class_id:33, label:kite, confidence:0.5555, left_top:[161.12,193.06],right_bottom:[171.45,199.78]
[INFO] class_id:33, label:kite, confidence:0.5332, left_top:[171.17,317.08],right_bottom:[228.08,357.65]
[INFO] class_id:33, label:kite, confidence:0.5322, left_top:[218.97,178.13],right_bottom:[451.95,241.61]

Swipe left to see more

7. Summary

In this project, we introduced the case of OpenVINO? Python API deploying the RT-DETR model with its own post-processing, and combined the model’s processing method to encapsulate a complete code case, realizing the use of OpenVINO? to accelerate the deep learning model on the Intel platform , which will help everyone implement the industrial application of RT-DETR model in the future. In order to better implement the RT-DETR model for everyone, we not only developed case codes on three platforms: Python, C++, and C#, but also tailored the model based on everyone’s model deployment habits to achieve the goal of removing the RT-DETR model. Deployment cases for single-input models handled.

in the text. Due to limited space, the implementation of other programming platforms and the deployment cases of models that do not include post-processing will be introduced in subsequent articles. Please pay attention to the subsequent articles published by this platform: “Deploying RT-based on OpenVINO? C++ API” DETR Model” and “Deploying RT-DETR Model Based on OpenVINO? C#”. If you are interested, you can first pay attention to the code repository of this project and obtain the project implementation source code.

OpenVINO?

–END–

You may want to know (click the blue text to view) Developer Practical Practice | Introducing OpenVINO? 2023.1: Empowering generative AI at the edge Based on ChatGLM2 and OpenVINO? Creating a Chinese chat assistant Based on Llama2 and OpenVINO? Build a chatbot OpenVINO? DevCon 2023 is back! Intel inspires developers’ unlimited potential with innovative products - 5th anniversary update | OpenVINO 2023.0, making AI deployment and acceleration easier - the highlight of OpenVINO's 5th anniversary! The 2023.0 version continues to upgrade AI deployment and acceleration performance OpenVINO? 2023.0 Practical Combat | Deploying the YOLOv8 target detection model in LabVIEW Developer Practical Series Resource Pack is here!  Paint with AI and wish her a happy holiday; in three simple steps, OpenVINO? helps you experience AIGC easily
 Still don’t know how to draw with OpenVINO? Click for tutorial.  Easily implement real-time reasoning for PaddleOCR with a few lines of code, come and get it!  Use OpenVINO to quickly implement high-performance artificial intelligence reasoning in "device-edge-cloud"

Scan the QR code below to experience it now
OpenVINO? Tool Suite 2023.1

Click to read the original text and experience OpenVINO 2023.1 now

The article is so exciting, are you “reading” it?

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeHomepageOverview 383857 people are learning the system