Foreword
An intuitive understanding of the amount of computation (FLOPs) and the amount of parameters (Params) is that the amount of computation corresponds to the time complexity, and the amount of parameters corresponds to the complexity of space. That is, the amount of computation depends on the length of network execution time, and the amount of parameters depends on the amount of video memory occupied. quantity.
Calculation amount: FLOPs, FLOP refers to the number of floating-point operations, s refers to seconds, which means the number of floating-point operations per second, and considers the calculation amount of a network model. The smaller the better
Parameter amount: Params refers to the total number of parameters that need to be trained in the network model. The smaller the better
After understanding the above concepts, the next step is how to calculate these two values.
A very common way is through the ptflos
package.
# -- coding: utf-8 -- import torchvision from ptflops import get_model_complexity_info model = torchvision.models.alexnet(pretrained=False) flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True) print('flops: ', flops, 'params: ', params)
This code can be said to be plug and play.
DAB-DETR model
The blogger took the DAB-DETR model as an example, and an error was reported when running, which was caused by the mismatch between the weight file and the model configuration file
The weight file does not match the model configuration
RuntimeError: Error(s) in loading state_dict for DABDeformableDETR: size mismatch for input_proj.0.0.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1] ). size mismatch for input_proj.1.0.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1] ). size mismatch for input_proj.2.0.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1] ). size mismatch for input_proj.3.0.weight: copying a param with shape torch.Size([256, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3] ).
Just modify the value of num_channels, originally [128, 256, 512]
if return_interm_layers: # return_layers = {"layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"} return_layers = {<!-- -->"layer2": "0", "layer3": "1", "layer4": "2"} self. strides = [8, 16, 32] self.num_channels = [512, 1024, 2048]
Inference code
The reasoning code is as follows: Almost all the reasoning codes of DETR-like models are universal.
import json import os, sys import torch import numpy as np from models import build_DABDETR from models.dab_deformable_detr import build_dab_deformable_detr from util.slconfig import SLConfig from datasets import build_dataset from util.visualizer import COCOVisualizer from util import box_ops model_config_path = "D:/graduate/others/DAB-DETR/config.json" # change the path of the model config file model_checkpoint_path = "D:/graduate/others/DAB-DETR/checkpoint.pth" # change the path of the model checkpoint # See our Model Zoo section in README.md for more details about our pretrained models. args = SLConfig.fromfile(model_config_path) model, criterion, postprocessors = build_DABDETR(args) checkpoint = torch.load(model_checkpoint_path, map_location='cpu') model.load_state_dict(checkpoint['model']) _ = model.eval() with open('util/coco_id2name.json') as f: id2name = json. load(f) id2name = {<!-- -->int(k): v for k, v in id2name.items()} from PIL import Image import datasets.transforms as T image = Image.open("./figure/4.jpg").convert("RGB") # load image # transform images transform = T. Compose([ T.RandomResize([800], max_size=1333), T.ToTensor(), T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) image, _ = transform(image, None) from ptflops import get_model_complexity_info model = model.to(args.device) flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True) print('flops: ', flops, 'params: ', params) # predict images with torch.no_grad(): output = model.cuda()(image[None].cuda()) # visualize outputs output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0] threshold = 0.5 # set a threshold vslzr = COCOVisualizer() scores = output['scores'] print(len(scores)) labels = output['labels'] boxes = box_ops.box_xyxy_to_cxcywh(output['boxes']) select_mask = scores > thresholds box_label = [id2name[int(item)] for item in labels[select_mask]] pred_dict = {<!-- --> 'boxes': boxes[select_mask], 'size': torch.Tensor([image.shape[1], image.shape[2]]), 'box_label': box_label } vslzr.visualize(image, pred_dict, savedir=None, dpi=120)
DN-DETR model
The DN-DETR model inference code is similar to the DAB-DETR model inference code, but the problem is not the same.
Null value problem
indicator0 = torch.zeros([num_queries * num_patterns, 1]).cuda() TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'
Null value problem, just assign num_patterns = 1
CPU and GPU computing problems
boxes = boxes * scale_fct[:, None, :] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Some data is on the cpu, some is on the gpu, add .cuda() after boxes = boxes * scale_fct[:, None, :]
tuple conversion problem
In addition, an error will be reported about the conversion of tuple
TypeError: tuple indices must be integers or slices, not str
put the following code
out_logits, out_bbox = outputs['pred_logits'], outputs['pred_boxes']
Change to:
out_logits=outputs[0]['pred_logits'] out_bbox = outputs[0]['pred_boxes']
Parameter calculation problem
So far, the reasoning code of the DN-DETR model has been corrected, but there is a problem when calculating the parameter quantity:
File "D:\Anaconda\envs\deformable_detr\lib\site-packages\ptflops\pytorch_ops.py", line 162, in multihead_attention_counter_hook q, k, v = input ValueError: not enough values to unpack (expected 3, got 2)
Here you can see that the error reported is a problem with the number of parameters. We find the original code and change q, k, v = input
to:
q, k= input, v=k
GPU and CPU computing problems
Similarly, the problem of inconsistency in the data calculation position is also reported here, and it can be processed in the same way.
File "E:\graduate\papers\DN-DETR\DN-DETR-main\models\DN_DAB_DETR\DABDETR.py", line 458, in forward boxes = boxes * scale_fct[:, None, :] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
DN-DAB-Deformable-DETR model
Parameter calculation problem
Since DN-DAB-Deformable-DETR shares a set of codes with DN-DAB-DETR, something goes wrong here.
q, k= input ValueError: too many values to unpack (expected 2)
Let’s check the length of the input, there are three values, then the original writing method is no problem, just change to the original writing method.
q, k, v= input
The problem of batch-size error is actually very easy to solve, because we are only reasoning, and there is only one picture, so we only need to set it to 1.
So far, the amount of inference and calculation of the DETR model, and the calculation of parameter quantities have been solved.
YOLO model calculation
Then there is the YOLO model, which has a similar calculation method. Originally, the blogger directly used the above code, but found that something went wrong.
The amount of parameters is always 0, which baffles me.
Then the blogger switched to another kit.
from thop import profile print('==> Building model..') input = torch.randn(1, 3, 224, 224) input = input. cuda() flops, params = profile(model, (input,)) print('flops: %.2f M, params: %.2f M' % (flops / 1e6, params / 1e6))
It’s OK, just like the DETR model, we can put it directly in the model reasoning code.