Graphics card A100
Install cuda11.3.1
wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run sudo sh cuda_11.3.1_465.19.01_linux.run
install cudnn
sudo dpkg -i libcudnn8 sudo dpkg -i libcudnn8 sudo dpkg -i libcudnn8
Create a virtual environment
conda create -n pcdet python==3.8
install torch
pip install torch==1.11.0 + cu113 torchvision==0.12.0 + cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
install spconv
Make sure cumm and spconv are not installed pip list | grep spconv pip list | grep cumm pip install spconv-cu113 -i https://pypi.mirrors.ustc.edu.cn/simple/
install pcdet
python setup.py develop -i https://pypi.mirrors.ustc.edu.cn/simple/
install cv2
pip install opencv_python -i https://github.com/Haiyang-W/DSVT.git
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
vim /home/suwei/suwei_ws/OpenPCDet/pcdet/datasets/__init__.py
The result was only 76 +
Run voxelnext_ioubranch-maxpool.yaml
After carefully looking at the voxelnext documentation, I found that IOU_branch, maxbool and other operations are running under waymo.
Among them, IOU_branch can only be set under waymo, and maxbool needs to install spconv-plus developed by the author to run.
Modify the yaml file, mainly to change waymo to kitti dataset
Next install spconv-plus,
First created a new environment, I cloned the environment just now, uninstalled pccm, ccimport, cumm, spconv-cu113
Then download the source code spconv-plus
git clone https://github.com/dvlab-research/spconv-plus.git
Then install pccm0.3.4, ccimport0.3.7, cumm==0.2.8
pip install pccm==0.3.4
pip install ccimport==0.3.7
pip install cumm==0.2.8
Then enter the spconv-plus directory
python setup.py bdist_wheel
Then cd dist/
pip install xxx.whl
So far the installation of spconv is complete
But always report an error
spconv has no SparseModule
Then I found pcdet/utils/spconv_utils.py, there is such a sentence
import spconv if float(spconv.__version__[2:]) >= 2.2: spconv.constants.SPCONV_USE_DIRECT_TABLE = False print(1) try: import spconv.pytorch as spconv print(2) except: import spconv as spconv print(3)
Run the program, output 2, 3, indicating that it is impossible to import spconv.pytorch as spconv, this sentence is the import spconv statement after spconv2.x
And the version I installed is 2.1.21, spconv-plus
So enter the python command line and enter import spconv.pytorch as spconv, and report an error
import spconv.core_cc as _ext ImportError: arg(): could not convert default argument 'timer: tv::CUDAKernelTimer' in method '<class 'spconv.core_cc.cumm.gemm.main.GemmParams'>.init' into a Python object (type not registered yet?)
This looks like cuda, pytorch version does not match, but I don’t want to reinstall
Another way to solve it.
Run directly with the original environment and report an error
self.max_pool_list = [spconv.SparseMaxPool2d(k, 1, 1, subm=True, algo=ConvAlgo.Native, index_key='max_pool_head%d'%i) for i, k in enumerate(kernel_size_list)] This sentence says that there is no subm parameter
By observing the code of maxpool in the official spconv, in spconv/pytorch/pool.py, the SparseMaxPool2d class does not have the parameter subm, but the SparseMaxPool it inherits has subm
class SparseMaxPool(SparseModule): def __init__(self, ndim, kernel_size: Union[int, List[int], Tuple[int, ...]] = 3, stride: Optional[Union[int, List[int], Tuple[int, ...]]] = 1, padding: Union[int, List[int], Tuple[int, ...]] = 0, dilation: Union[int, List[int], Tuple[int, ...]] = 1, index_key: Optional[str] = None, subm: bool = False, ## here is subm algo: Optional[ConvAlgo] = None, record_voxel_count: bool = False, name=None):
However, the SparseMaxPool2d class in spconv-plus has the subm parameter, so I want to modify the official spconv code directly, as follows
Directly find envs/xxx/lib/python3.8/site-packages/spconv/pytorch/pool.py in the environment
Modify the code and directly add the parameter subm=False to the SparseMaxPool2d class
class SparseMaxPool2d(SparseMaxPool): def __init__(self, kernel_size, stride=None, padding=0, dilation=1, index_key=None, subm=False, algo: Optional[ConvAlgo] = None, record_voxel_count: bool = False, name=None): super(SparseMaxPool2d, self).__init__(2, kernel_size, stride, padding, dilation, index_key=indice_key, subm=False, algo=algo, record_voxel_count=record_voxel_count, name=name)
Then run train.py and report an error
spatial_indices = self. forward_ret_dict[‘voxel_indices’][:, 1:]
The sentence keyerror does not have the key voxel_indices
But when the head is voxelnext_head.py, there is no such problem. Find the code, compare it, and then add it to the forward method
self.forward_ret_dict['voxel_indices'] = voxel_indices
Then it runs normally, and the running results will be reported tomorrow:
There is a bug when running the test
File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 526, in forward_test x_hm_max = max_pool(x_hm, True) File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) TypeError: forward() takes 2 positional arguments but 3 were given
The reason is that the code in spconv is different. There is one more parameter in the forward of SparseMaxPool inherited by SparseMaxPool2d.
def forward(self, input, return_inverse=False):
I directly changed line 526 in voxelnext_head_maxpool.py to
x_hm_max = max_piool(x_hm)
Another bug
Traceback (most recent call last): File "train.py", line 229, in <module> main() File "train.py", line 219, in main repeat_eval_ckpt( File "/home/suwei/suwei_ws/max_voxelnext/tools/test.py", line 123, in repeat_eval_ckpt tb_dict = eval_utils.eval_one_epoch( File "/home/suwei/suwei_ws/max_voxelnext/tools/eval_utils/eval_utils.py", line 65, in eval_one_epoch pred_dicts, ret_dict = model(batch_dict) File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "../pcdet/models/detectors/voxelnext.py", line 13, in forward batch_dict = cur_module(batch_dict) File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 610, in forward data_dict = forward_test(x, data_dict) File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 528, in forward_test selected = (x_hm_max. features == x_hm. features). squeeze(-1) RuntimeError: The size of tensor a (199419) must match the size of tensor b (712) at non-singleton dimension 0
still not working
This is a bug of gcc (or pybind) that the dependency cumm is built in gcc 10 and spconv is built in gcc 9. I will change cumm build env to gcc 9, please update your cumm to v0.1.10 by pip install -U cumm-cu111 after an hour.
try changing gcc