Install voxelnext environment to run kitti

Graphics card A100

Install cuda11.3.1

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run
sudo sh cuda_11.3.1_465.19.01_linux.run

install cudnn

sudo dpkg -i libcudnn8
sudo dpkg -i libcudnn8
sudo dpkg -i libcudnn8

Create a virtual environment

conda create -n pcdet python==3.8

install torch

pip install torch==1.11.0 + cu113 torchvision==0.12.0 + cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

install spconv

Make sure cumm and spconv are not installed
pip list | grep spconv
pip list | grep cumm

pip install spconv-cu113 -i https://pypi.mirrors.ustc.edu.cn/simple/

install pcdet

python setup.py develop -i https://pypi.mirrors.ustc.edu.cn/simple/

install cv2

pip install opencv_python -i https://github.com/Haiyang-W/DSVT.git
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
vim /home/suwei/suwei_ws/OpenPCDet/pcdet/datasets/__init__.py

The result was only 76 +

Run voxelnext_ioubranch-maxpool.yaml

After carefully looking at the voxelnext documentation, I found that IOU_branch, maxbool and other operations are running under waymo.
Among them, IOU_branch can only be set under waymo, and maxbool needs to install spconv-plus developed by the author to run.
Modify the yaml file, mainly to change waymo to kitti dataset
Next install spconv-plus,
First created a new environment, I cloned the environment just now, uninstalled pccm, ccimport, cumm, spconv-cu113
Then download the source code spconv-plus

git clone https://github.com/dvlab-research/spconv-plus.git

Then install pccm0.3.4, ccimport0.3.7, cumm==0.2.8

pip install pccm==0.3.4
pip install ccimport==0.3.7
pip install cumm==0.2.8

Then enter the spconv-plus directory

python setup.py bdist_wheel

Then cd dist/

pip install xxx.whl

So far the installation of spconv is complete

But always report an error

spconv has no SparseModule

Then I found pcdet/utils/spconv_utils.py, there is such a sentence

import spconv
if float(spconv.__version__[2:]) >= 2.2:
    spconv.constants.SPCONV_USE_DIRECT_TABLE = False
print(1)
try:
    import spconv.pytorch as spconv
    print(2)
except:
    import spconv as spconv
    print(3)

Run the program, output 2, 3, indicating that it is impossible to import spconv.pytorch as spconv, this sentence is the import spconv statement after spconv2.x
And the version I installed is 2.1.21, spconv-plus
So enter the python command line and enter import spconv.pytorch as spconv, and report an error

import spconv.core_cc as _ext
ImportError: arg(): could not convert default argument 'timer: tv::CUDAKernelTimer' in method '<class 'spconv.core_cc.cumm.gemm.main.GemmParams'>.init' into a Python object (type not registered yet?)

This looks like cuda, pytorch version does not match, but I don’t want to reinstall
Another way to solve it.
Run directly with the original environment and report an error

 self.max_pool_list = [spconv.SparseMaxPool2d(k, 1, 1, subm=True, algo=ConvAlgo.Native, index_key='max_pool_head%d'%i) for i, k in enumerate(kernel_size_list)]
  This sentence says that there is no subm parameter

By observing the code of maxpool in the official spconv, in spconv/pytorch/pool.py, the SparseMaxPool2d class does not have the parameter subm, but the SparseMaxPool it inherits has subm

class SparseMaxPool(SparseModule):
    def __init__(self,
                 ndim,
                 kernel_size: Union[int, List[int], Tuple[int, ...]] = 3,
                 stride: Optional[Union[int, List[int], Tuple[int, ...]]] = 1,
                 padding: Union[int, List[int], Tuple[int, ...]] = 0,
                 dilation: Union[int, List[int], Tuple[int, ...]] = 1,
                 index_key: Optional[str] = None,
                 subm: bool = False, ## here is subm
                 algo: Optional[ConvAlgo] = None,
                 record_voxel_count: bool = False,
                 name=None):

However, the SparseMaxPool2d class in spconv-plus has the subm parameter, so I want to modify the official spconv code directly, as follows
Directly find envs/xxx/lib/python3.8/site-packages/spconv/pytorch/pool.py in the environment
Modify the code and directly add the parameter subm=False to the SparseMaxPool2d class

class SparseMaxPool2d(SparseMaxPool):
    def __init__(self,
                 kernel_size,
                 stride=None,
                 padding=0,
                 dilation=1,
                 index_key=None,
                 subm=False,
                 algo: Optional[ConvAlgo] = None,
                 record_voxel_count: bool = False,
                 name=None):
        super(SparseMaxPool2d,
              self).__init__(2,
                             kernel_size,
                             stride,
                             padding,
                             dilation,
                             index_key=indice_key,
                             subm=False,
                             algo=algo,
                             record_voxel_count=record_voxel_count,
                             name=name)

Then run train.py and report an error
spatial_indices = self. forward_ret_dict[‘voxel_indices’][:, 1:]
The sentence keyerror does not have the key voxel_indices
But when the head is voxelnext_head.py, there is no such problem. Find the code, compare it, and then add it to the forward method

self.forward_ret_dict['voxel_indices'] = voxel_indices

Then it runs normally, and the running results will be reported tomorrow:

There is a bug when running the test

 File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 526, in forward_test
    x_hm_max = max_pool(x_hm, True)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given

The reason is that the code in spconv is different. There is one more parameter in the forward of SparseMaxPool inherited by SparseMaxPool2d.

def forward(self, input, return_inverse=False):

I directly changed line 526 in voxelnext_head_maxpool.py to

x_hm_max = max_piool(x_hm)

Another bug

Traceback (most recent call last):
  File "train.py", line 229, in <module>
    main()
  File "train.py", line 219, in main
    repeat_eval_ckpt(
  File "/home/suwei/suwei_ws/max_voxelnext/tools/test.py", line 123, in repeat_eval_ckpt
    tb_dict = eval_utils.eval_one_epoch(
  File "/home/suwei/suwei_ws/max_voxelnext/tools/eval_utils/eval_utils.py", line 65, in eval_one_epoch
    pred_dicts, ret_dict = model(batch_dict)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/detectors/voxelnext.py", line 13, in forward
    batch_dict = cur_module(batch_dict)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 610, in forward
    data_dict = forward_test(x, data_dict)
  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 528, in forward_test
    selected = (x_hm_max. features == x_hm. features). squeeze(-1)
RuntimeError: The size of tensor a (199419) must match the size of tensor b (712) at non-singleton dimension 0

still not working
This is a bug of gcc (or pybind) that the dependency cumm is built in gcc 10 and spconv is built in gcc 9. I will change cumm build env to gcc 9, please update your cumm to v0.1.10 by pip install -U cumm-cu111 after an hour.
try changing gcc