Developer Practice | Deploying RT-DETR model based on OpenVINO? C++ API

Click on the blue text

The following article is from Intel Internet of Things, written by Yan Guojin, Intel Edge Computing Innovation Master

RT-DETR is an improvement based on the DETR model. It is a real-time end-to-end detector based on the DETR architecture. It achieves more efficient training and inference by using a series of new technologies and algorithms. We published in the previous article “Deploying RT-DETR model based on OpenVINO? Python API | Developer practice”, in this article, we show you the deployment process of RT-DETR model including post-processing based on OpenVINO? Python API, but in actual industrial applications In order to integrate more with the current software platform, we will use the C++ platform, so in this article, we will show you the deployment process of the RT-DETR model without post-processing based on the OpenVINO? C++ API, and Show you how to export an RT-DETR model without post-processing.

All the code used in this project has been open sourced on GitHub and is collected in the OpenVINO?-CSharp-API project. The link to the project directory is:

https://github.com/guojin-yan/OpenVINO-CSharp-API/tree/csharp3.0/tutorial_examples

(Copy the link and open it in your browser)

You can also access the project directly, the project link is:

https://github.com/guojin-yan/RT-DETR-OpenVINO.git

(Copy the link and open it in your browser)

1.RT-DETR

Feipiao launched the high-precision universal target detection model PP-YOLOE in March last year, and in the same year proposed PP-YOLOE+ based on PP-YOLOE. After PP-YOLOE was proposed, models such as MT-YOLOv6, YOLOv7, DAMO-YOLO, and RTMDet were successively proposed, and it has been iterated until YOLOv8 at the beginning of this year.

A major improvement point of the YOLO detector is that it requires NMS post-processing, which is usually difficult to optimize and not robust enough, so there is a delay in the speed of the detector. DETR is a Transformer-based end-to-end object detector that does not require NMS post-processing. Baidu Flying Paddle officially launched RT-DETR (Real-Time DEtection TRansformer), a real-time end-to-end detector based on the DETR architecture, which achieves SOTA performance in speed and accuracy.

RT-DETR is improved on the DETR model and achieves more efficient training and inference by using a series of new technologies and algorithms. Specifically, RT-DETR has the following advantages:

1. Better real-time performance

RT-DETR adopts a new attention mechanism that can better capture the relationship between objects and reduce the amount of computation. In addition, RT-DETR also introduces a time-based attention mechanism to better process video data.

2. Higher accuracy

RT-DETR can maintain high detection accuracy while ensuring real-time performance. This is mainly due to a new multi-task learning mechanism introduced by RT-DETR, which can better utilize training data.

3. Easier to train and adjust parameters

RT-DETR uses a new loss function to better train and adjust parameters. In addition, RT-DETR introduces a new data augmentation technique that enables better utilization of training data.

2. OpenVINO?

The Intel distribution OpenVINO tool suite is developed based on oneAPI, which can accelerate the development of high-performance computer vision and deep learning vision applications. The tool suite is applicable to various Intel platforms from edge to cloud, helping users to develop more accurate images faster. Real-world results are deployed into production systems. Through streamlined development workflows, OpenVINO empowers developers to deploy high-performance applications and algorithms in the real world.

OpenVINO? 2023.1 was released on September 18, 2023, and the toolkit brings new capabilities to unlock the full potential of generative artificial intelligence. Generative AI coverage has been expanded, with an enhanced experience through frameworks like PyTorch*, where you can automatically import and transform models. Large Language Models (LLMs) have received improvements in runtime performance and memory optimizations. Models for chatbots, code generation, and more are enabled. OpenVINO? is more portable, higher-performing and can run wherever you need it: at the edge, in the cloud or on-premises.

3. Environment configuration

In the previous article, we have provided you with the environment needed to export the RT-DETR model. We will not show it here. In order to better reproduce the project code, we will provide you with this development here. C++ environment used:

openvino: 2023.1.0
opencv: 4.5.5

When copying the code, you can use the same environment or an environment that is closer to the environment used by the author for development to prevent unnecessary errors during use; in addition, the project provides two compilation methods, and you can use Visual Studio for development. Compile and CMake to compile.

4. Model download and conversion

In the previous article, we have shown you how to export the RT-DETR pre-trained model. This model includes post-processing by default; therefore in this article, we will show you the RT-DETR model without post-processing. Export methods and differences between the two models.

4.1 Model export

The PaddleDetection official library provides us with a very friendly API interface, so it is very easy to export the RT-DETR model without post-processing. First modify the configuration file, mainly modify the configuration file of the RT-DETR model. The configuration file path is:

.\PaddleDetection\configs\rtdetr\_base_\rtdetr_r50vd.yml

Add the exclude_post_process: True statement under the DETR project of the configuration file, as shown in the following figure:

Then re-run the model export command to obtain the model without post-processing:

python tools/export_model.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams trt=True --output_dir=output_inference< /pre>
 <p>Swipe left to see more</p>
 <p>After the model is exported, we can convert it to ONNX format and IR format. Please refer to the model conversion content in the previous article.</p>
 <p>4.2 Comparison of model information</p>
 <p>We can see from the table below that the trimmed model only contains one input node, and its output node has also changed. The original model output is the processed prediction output. After trimming, the output content of the model output node has changed significantly. Variety. in:<br></p>
 <p>stack_7.tmp_0.slice_0: This node represents the prediction box information of 300 prediction results;</p>
 <p>stack_8.tmp_0.slice_0: This node represents the confidence of 80 classification information of 300 prediction results. During subsequent processing, the final prediction classification information needs to be obtained based on the prediction results.</p>
 <p><img src="//i2.wp.com/img-blog.csdnimg.cn/img_convert/b4badf22596cf01fd8ee04828985c6e3.png" alt="b4badf22596cf01fd8ee04828985c6e3.png"></p>
 <p><img src="//i2.wp.com/img-blog.csdnimg.cn/img_convert/189e359f41a44691c6df3aba82164192.png" alt="189e359f41a44691c6df3aba82164192.png"></p>
 <p><img src="//i2.wp.com/img-blog.csdnimg.cn/img_convert/29ff2f5377f4285d9f5f4639849cefd0.png" alt="29ff2f5377f4285d9f5f4639849cefd0.png"></p>
 <p>Model node information diagram (the left picture shows the model node information including post-processing, the right picture shows the model node information without post-processing)</p>
 <p><img src="//i2.wp.com/img-blog.csdnimg.cn/img_convert/e7039a9290eaa64953c65b9df88a65f4.gif" alt="e7039a9290eaa64953c65b9df88a65f4.gif"></p>
 <p>5. C++ code implementation</p>
 <p>In order to more systematically implement the reasoning process of the RT-DETR model, we use C++ features to encapsulate the RTDETRPredictor model reasoning class and the RTDETRProcess model data processing class. Below we will explain the key codes in these two classes.</p>
 <p>5.1 Model reasoning class implementation</p>
 <p>The RTDETRPredictor model inference class we defined in the C++ code is as follows:</p>
 <pre>class RTDETRPredictor
{
public:
    RTDETRPredictor(std::string model_path, std::string label_path,
        std::string device_name = "CPU", bool postprcoess = true);
    cv::Mat predict(cv::Mat image);
private:
    void pritf_model_info(std::shared_ptr<ov::Model> model);
    void fill_tensor_data_image(ov::Tensor & amp; input_tensor, const cv::Mat & amp; input_image);
    void fill_tensor_data_float(ov::Tensor & amp; input_tensor, float* input_data, int data_size);
private:
    RTDETRProcess rtdetr_process;
    bool post_flag;
    ov::Core core;
    std::shared_ptr<ov::Model> model;
    ov::CompiledModel compiled_model;
    ov::InferRequest infer_request;
};

Swipe left to see more

1) Model inference class initialization

First we need to initialize the model inference class and initialize related information:

RTDETRPredictor::RTDETRPredictor(std::string model_path, std::string label_path,
std::string device_name, bool post_flag)
  :post_flag(post_flag){
    INFO("Model path: " + model_path);
    INFO("Device name: " + device_name);
    model = core.read_model(model_path);
    pritf_model_info(model);
    compiled_model = core.compile_model(model, device_name);
    infer_request = compiled_model.create_infer_request();
    rtdetr_process = RTDETRProcess(cv::Size(640, 640), label_path, 0.5);
}

Swipe left to see more

This method mainly contains the following inputs:

model_path: inference model address;

label_path: model prediction category file;

device_name: inference device name;

post_flag: Whether the model includes post-processing. When post_flag = true, post-processing is included. When post_flag = false, post-processing is not included.

2) Image prediction API

This step mainly predicts the input image and applies the model prediction results to the input image. The following is the main code of this stage:

cv::Mat RTDETRPredictor::predict(cv::Mat image){
    cv::Mat blob_image = rtdetr_process.preprocess(image);
    if (post_flag) {
        ov::Tensor image_tensor = infer_request.get_tensor("image");
        ov::Tensor shape_tensor = infer_request.get_tensor("im_shape");
        ov::Tensor scale_tensor = infer_request.get_tensor("scale_factor");
        image_tensor.set_shape({ 1,3,640,640 });
        shape_tensor.set_shape({ 1,2 });
        scale_tensor.set_shape({ 1,2 });
        fill_tensor_data_image(image_tensor, blob_image);
        fill_tensor_data_float(shape_tensor, rtdetr_process.get_input_shape().data(), 2);
        fill_tensor_data_float(scale_tensor, rtdetr_process.get_scale_factor().data(), 2);
    } else {
        ov::Tensor image_tensor = infer_request.get_input_tensor();
        fill_tensor_data_image(image_tensor, blob_image);
    }
    infer_request.infer();
    ResultData results;
    if (post_flag) {
        ov::Tensor output_tensor = infer_request.get_tensor("reshape2_95.tmp_0");
        float result[6 * 300] = {0};
        for (int i = 0; i < 6 * 300; + + i) {
            result[i] = output_tensor.data<float>()[i];
        }
        results = rtdetr_process.postprocess(result, nullptr, true);
    } else {
        ov::Tensor score_tensor = infer_request.get_tensor(model->outputs()[1].get_any_name());
        ov::Tensor bbox_tensor = infer_request.get_tensor(model->outputs()[0].get_any_name());
        float score[300 * 80] = {0};
        float bbox[300 * 4] = {0};
        for (int i = 0; i < 300; + + i) {
            for (int j = 0; j < 80; + + j) {
                score[80 * i + j] = score_tensor.data<float>()[80 * i + j];
            }
            for (int j = 0; j < 4; + + j) {
                bbox[4 * i + j] = bbox_tensor.data<float>()[4 * i + j];
            }
        }
        results = rtdetr_process.postprocess(score, bbox, false);
    }
    return rtdetr_process.draw_box(image, results);
}

Swipe left to see more

The main logic of the above code is as follows: first, it processes the input image, calls the defined data processing class, and processes the input image into the specified data type; then configures the model input data according to the input node of the model. If dynamic model input is used, The input shape needs to be set; the next step is to perform model inference; and finally, the inference results are processed and drawn on the input image.

5.2 Model data processing class RTDETRProcess

1) Define RTDETRProcess

class RTDETRProcess
{
public:
    RTDETRProcess() {}
    RTDETRProcess(cv::Size target_size, std::string label_path = NULL, float threshold = 0.5,
        cv::InterpolationFlags interpf = cv::INTER_LINEAR);
    cv::Mat preprocess(cv::Mat image);
    ResultData postprocess(float* score, float* bboxs, bool post_flag);
    std::vector<float> get_im_shape() { return im_shape; }
    std::vector<float> get_input_shape() { return { (float)target_size.width ,(float)target_size.height }; }
    std::vector<float> get_scale_factor() { return scale_factor; }
    cv::Mat draw_box(cv::Mat image, ResultData results);
private:
    void read_labels(std::string label_path);
    template<class T>
    float sigmoid(T data) { return 1.0f / (1 + std::exp(-data));}
    template<class T>
    int argmax(T* data, int length) {
        std::vector<T> arr(data, data + length);
        return (int)(std::max_element(arr.begin(), arr.end()) - arr.begin());
    }
private:
    cv::Size target_size; // The model input size.
    std::vector<std::string> labels; // The model classification label.
    float threshold; // The threshold parameter.
    cv::InterpolationFlags interpf; // The image scaling method.
    std::vector<float> im_shape;
    std::vector<float> scale_factor;
};

Swipe left to see more

2) Input data processing method

cv::Mat RTDETRProcess::preprocess(cv::Mat image){
    im_shape = { (float)image.rows, (float)image.cols };
    scale_factor = { 640.0f / (float)image.rows, 640.0f / (float)image.cols};
    cv::Mat blob_image;
    cv::cvtColor(image, blob_image, cv::COLOR_BGR2RGB);
    cv::resize(blob_image, blob_image, target_size, 0, 0, cv::INTER_LINEAR);
    std::vector<cv::Mat> rgb_channels(3);
    cv::split(blob_image, rgb_channels);
    for (auto i = 0; i < rgb_channels.size(); i + + ) {
        rgb_channels[i].convertTo(rgb_channels[i], CV_32FC1, 1.0 / 255.0);
    }
    cv::merge(rgb_channels, blob_image);
    return blob_image;
}

Swipe left to see more

3) Prediction result data processing method

ResultData RTDETRProcess::postprocess(float* score, float* bbox, bool post_flag)
{
    ResultData result;
    if (post_flag) {
        for (int i = 0; i < 300; + + i) {
            if (score[6 * i + 1] > threshold) {
                result.clsids.push_back((int)score[6 * i]);
                result.labels.push_back(labels[(int)score[6 * i]]);
                result.bboxs.push_back(cv::Rect(score[6 * i + 2], score[6 * i + 3],
                    score[6 * i + 4] - score[6 * i + 2],
                    score[6 * i + 5] - score[6 * i + 3]));
                result.scores.push_back(score[6 * i + 1]);
            }
        }
    } else {
        for (int i = 0; i < 300; + + i) {
            float s[80];
            for (int j = 0; j < 80; + + j) {
                s[j] = score[80 * i + j];
            }
            int clsid = argmax<float>(s, 80);
            float max_score = sigmoid<float>(s[clsid]);
            if (max_score > threshold) {
                result.clsids.push_back(clsid);
                result.labels.push_back(labels[clsid]);
                float cx = bbox[4 * i] * 640.0 / scale_factor[1];
                float cy = bbox[4 * i + 1] * 640.0 / scale_factor[0];
                float w = bbox[4 * i + 2] * 640.0 / scale_factor[1];
                float h = bbox[4 * i + 3] * 640.0 / scale_factor[0];
                result.bboxs.push_back(cv::Rect((int)(cx - w / 2), (int)(cy - h / 2), w, h));
                result.scores.push_back(max_score);
            }
        }
    }
    return result;
}

Swipe left to see more

Here is an explanation of the output results. Since we provide the output of two models, the output data processing methods of the two models are provided here. The main difference lies in whether to restore the prediction box and extract the prediction category. The specific difference You can view the above code.

6. Display of prediction results

Finally, through the above code, we can finally directly implement the inference deployment of the RT-DETR model. RT-DETR and the training model use the COCO data set. Finally, we can obtain the predicted image results, as shown in the figure:

The above figure shows the RT-DETR model prediction results. At the same time, we print the key information and inference results in the model diagram:

[INFO] This is an RT-DETR model deployment case using C++ !
[INFO] Model path: E:\Model\RT-DETR\RTDETR_cropping\rtdetr_r50vd_6x_coco.onnx
[INFO] Device name: CPU
[INFO] Inference Model
[INFO] Model name: Model from PaddlePaddle.
[INFO] Input:
[INFO] name: image
[INFO] type: float
[INFO] shape: [?,3,640,640]
[INFO] Output:
[INFO] name: stack_7.tmp_0_slice_0
[INFO] type: float
[INFO] shape: [?,300,4]
[INFO] name: stack_8.tmp_0_slice_0
[INFO] type: float
[INFO] shape: [?,300,80]
[INFO] Infer result:
[INFO] class_id: 0, label: person, confidence: 0.928, left_top: [215, 327], right_bottom: [259, 468]
[INFO] class_id : 0, label : person, confidence : 0.923, left_top : [260, 343], right_bottom: [309, 460]
[INFO] class_id : 0, label : person, confidence : 0.893, left_top : [402, 346], right_bottom: [451, 478]
[INFO] class_id : 0, label : person, confidence : 0.796, left_top : [456, 369], right_bottom: [507, 479]
[INFO] class_id : 0, label : person, confidence : 0.830, left_top : [519, 360], right_bottom: [583, 479]
[INFO] class_id : 33, label : kite, confidence : 0.836, left_top : [323, 159], right_bottom: [465, 213]
[INFO] class_id: 33, label: kite, confidence: 0.805, left_top: [329, 64], right_bottom: [388, 85]
[INFO] class_id: 33, label: kite, confidence: 0.822, left_top: [282, 217], right_bottom: [419, 267]
[INFO] class_id : 0, label : person, confidence : 0.834, left_top : [294, 384], right_bottom: [354, 443]
[INFO] class_id: 33, label: kite, confidence: 0.793, left_top: [504, 195], right_bottom: [522, 214]
[INFO] class_id : 33, label : kite, confidence : 0.524, left_top : [233, 22], right_bottom: [242, 29]
[INFO] class_id : 33, label : kite, confidence : 0.763, left_top : [116, 178], right_bottom: [136, 190]
[INFO] class_id : 0, label : person, confidence : 0.601, left_top : [497, 380], right_bottom: [529, 479]
[INFO] class_id : 33, label : kite, confidence : 0.764, left_top : [460, 251], right_bottom: [478, 268]
[INFO] class_id : 33, label : kite, confidence : 0.605, left_top : [176, 236], right_bottom: [256, 257]
[INFO] class_id : 0, label : person, confidence : 0.732, left_top : [154, 380], right_bottom: [210, 420]
[INFO] class_id : 33, label : kite, confidence : 0.574, left_top : [221, 264], right_bottom: [342, 312]
[INFO] class_id : 33, label : kite, confidence : 0.588, left_top : [97, 316], right_bottom: [155, 359]
[INFO] class_id: 33, label: kite, confidence: 0.523, left_top: [171, 317], right_bottom: [227, 357]
[INFO] class_id: 33, label: kite, confidence: 0.657, left_top: [363, 120], right_bottom: [375, 129]
[INFO] class_id : 0, label : person, confidence : 0.698, left_top : [26, 341], right_bottom: [57, 425]
[INFO] class_id: 33, label: kite, confidence: 0.798, left_top: [242, 124], right_bottom: [263, 135]
[INFO] class_id : 33, label : kite, confidence : 0.528, left_top : [218, 178], right_bottom: [451, 241]
[INFO] class_id : 33, label : kite, confidence : 0.685, left_top : [430, 29], right_bottom: [449, 43]
[INFO] class_id : 33, label : kite, confidence : 0.640, left_top : [363, 120], right_bottom: [375, 129]
[INFO] class_id : 33, label : kite, confidence : 0.559, left_top : [161, 193], right_bottom: [171, 199]

Swipe left to see more

7. Summary

In this project, we introduced the case of OpenVINO? C++ API deploying the RT-DETR model with its own post-processing, and combined the model’s processing method to encapsulate a complete code case, realizing the use of OpenVINO? to accelerate depth on the Intel platform Learning the model will help you implement the industrial application of the RT-DETR model in the future.

In the next article “Deploying RT-DETR model based on OpenVINO? Python C#”, we will implement the deployment of RT-DETR model based on the C# API interface, and compare the inference speed of different platforms based on the developed code. If you are interested, you can first pay attention to the code repository of this project and obtain the project implementation source code.

OpenVINO?

–END–

You may want to know (click on the blue text to view) Developers' actual combat | Introducing OpenVINO? 2023.1: Empowering generative AI at the edge Based on ChatGLM2 and OpenVINO? Creating a Chinese chat assistant Based on Llama2 and OpenVINO? Build a chatbot OpenVINO? DevCon 2023 is back! Intel inspires developers’ unlimited potential with innovative products - 5th anniversary update | OpenVINO 2023.0, making AI deployment and acceleration easier - the highlight of OpenVINO's 5th anniversary! The 2023.0 version continues to upgrade AI deployment and acceleration performance OpenVINO? 2023.0 Practical Combat | Deploy the YOLOv8 target detection model in LabVIEW Developer Practical Series Resource Pack is here!  Paint with AI and wish her a happy holiday; in three simple steps, OpenVINO? helps you experience AIGC easily

 Still don’t know how to draw with OpenVINO? Click for tutorial.  Easily implement real-time reasoning for PaddleOCR with a few lines of code, come and get it!  Use OpenVINO to quickly implement high-performance artificial intelligence reasoning in "device-edge-cloud"

Scan the QR code below to experience it now
OpenVINO? Toolkit 2023.1

Click to read the original text and experience OpenVINO 2023.1 now

The article is so exciting, are you “reading” it?