DETR, YOLO model calculation amount (FLOPs) parameter amount (Params)

Foreword

An intuitive understanding of the amount of computation (FLOPs) and the amount of parameters (Params) is that the amount of computation corresponds to the time complexity, and the amount of parameters corresponds to the complexity of space. That is, the amount of computation depends on the length of network execution time, and the amount of parameters depends on the amount of video memory occupied. quantity.

Calculation amount: FLOPs, FLOP refers to the number of floating-point operations, s refers to seconds, which means the number of floating-point operations per second, and considers the calculation amount of a network model. The smaller the better

Parameter amount: Params refers to the total number of parameters that need to be trained in the network model. The smaller the better

After understanding the above concepts, the next step is how to calculate these two values.
A very common way is through the ptflos package.

# -- coding: utf-8 --
import torchvision
from ptflops import get_model_complexity_info

model = torchvision.models.alexnet(pretrained=False)
flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
print('flops: ', flops, 'params: ', params)

This code can be said to be plug and play.

DAB-DETR model

The blogger took the DAB-DETR model as an example, and an error was reported when running, which was caused by the mismatch between the weight file and the model configuration file

The weight file does not match the model configuration

RuntimeError: Error(s) in loading state_dict for DABDeformableDETR:
size mismatch for input_proj.0.0.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1] ).
size mismatch for input_proj.1.0.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1] ).
size mismatch for input_proj.2.0.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1] ).
size mismatch for input_proj.3.0.weight: copying a param with shape torch.Size([256, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3] ).

Just modify the value of num_channels, originally [128, 256, 512]

 if return_interm_layers:
        # return_layers = {"layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"}
        return_layers = {<!-- -->"layer2": "0", "layer3": "1", "layer4": "2"}
        self. strides = [8, 16, 32]
        self.num_channels = [512, 1024, 2048]

Inference code

The reasoning code is as follows: Almost all the reasoning codes of DETR-like models are universal.

import json
import os, sys
import torch
import numpy as np

from models import build_DABDETR
from models.dab_deformable_detr import build_dab_deformable_detr
from util.slconfig import SLConfig
from datasets import build_dataset
from util.visualizer import COCOVisualizer
from util import box_ops
model_config_path = "D:/graduate/others/DAB-DETR/config.json" # change the path of the model config file
model_checkpoint_path = "D:/graduate/others/DAB-DETR/checkpoint.pth" # change the path of the model checkpoint
# See our Model Zoo section in README.md for more details about our pretrained models.

args = SLConfig.fromfile(model_config_path)
model, criterion, postprocessors = build_DABDETR(args)
checkpoint = torch.load(model_checkpoint_path, map_location='cpu')
model.load_state_dict(checkpoint['model'])
_ = model.eval()
with open('util/coco_id2name.json') as f:
    id2name = json. load(f)
    id2name = {<!-- -->int(k): v for k, v in id2name.items()}
from PIL import Image
import datasets.transforms as T
image = Image.open("./figure/4.jpg").convert("RGB") # load image
# transform images
transform = T. Compose([
    T.RandomResize([800], max_size=1333),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image, _ = transform(image, None)
from ptflops import get_model_complexity_info
model = model.to(args.device)
flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
print('flops: ', flops, 'params: ', params)
# predict images
with torch.no_grad():
    output = model.cuda()(image[None].cuda())
  # visualize outputs
output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0]
threshold = 0.5 # set a threshold
vslzr = COCOVisualizer()
scores = output['scores']
print(len(scores))
labels = output['labels']
boxes = box_ops.box_xyxy_to_cxcywh(output['boxes'])
select_mask = scores > thresholds

box_label = [id2name[int(item)] for item in labels[select_mask]]
pred_dict = {<!-- -->
      'boxes': boxes[select_mask],
      'size': torch.Tensor([image.shape[1], image.shape[2]]),
      'box_label': box_label
}

vslzr.visualize(image, pred_dict, savedir=None, dpi=120)

DN-DETR model

The DN-DETR model inference code is similar to the DAB-DETR model inference code, but the problem is not the same.

Null value problem

indicator0 = torch.zeros([num_queries * num_patterns, 1]).cuda()
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

Null value problem, just assign num_patterns = 1

CPU and GPU computing problems

boxes = boxes * scale_fct[:, None, :]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Some data is on the cpu, some is on the gpu, add .cuda() after boxes = boxes * scale_fct[:, None, :]

tuple conversion problem

In addition, an error will be reported about the conversion of tuple

TypeError: tuple indices must be integers or slices, not str

put the following code

out_logits, out_bbox = outputs['pred_logits'], outputs['pred_boxes']

Change to:

out_logits=outputs[0]['pred_logits']
out_bbox = outputs[0]['pred_boxes']

Parameter calculation problem

So far, the reasoning code of the DN-DETR model has been corrected, but there is a problem when calculating the parameter quantity:

File "D:\Anaconda\envs\deformable_detr\lib\site-packages\ptflops\pytorch_ops.py", line 162, in multihead_attention_counter_hook
    q, k, v = input
ValueError: not enough values to unpack (expected 3, got 2)

Here you can see that the error reported is a problem with the number of parameters. We find the original code and change q, k, v = input to:

q, k= input, v=k

GPU and CPU computing problems

Similarly, the problem of inconsistency in the data calculation position is also reported here, and it can be processed in the same way.

 File "E:\graduate\papers\DN-DETR\DN-DETR-main\models\DN_DAB_DETR\DABDETR.py", line 458, in forward
    boxes = boxes * scale_fct[:, None, :]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

DN-DAB-Deformable-DETR model

Parameter calculation problem

Since DN-DAB-Deformable-DETR shares a set of codes with DN-DAB-DETR, something goes wrong here.

 q, k= input
ValueError: too many values to unpack (expected 2)

Let’s check the length of the input, there are three values, then the original writing method is no problem, just change to the original writing method.

q, k, v= input

The problem of batch-size error is actually very easy to solve, because we are only reasoning, and there is only one picture, so we only need to set it to 1.

So far, the amount of inference and calculation of the DETR model, and the calculation of parameter quantities have been solved.

YOLO model calculation

Then there is the YOLO model, which has a similar calculation method. Originally, the blogger directly used the above code, but found that something went wrong.
The amount of parameters is always 0, which baffles me.

Then the blogger switched to another kit.

from thop import profile
print('==> Building model..')
input = torch.randn(1, 3, 224, 224)
input = input. cuda()
flops, params = profile(model, (input,))
print('flops: %.2f M, params: %.2f M' % (flops / 1e6, params / 1e6))

It’s OK, just like the DETR model, we can put it directly in the model reasoning code.