MipNerf was built from scratch to train its own data set

Premise

Anaconda is installed successfully by default, and there is no need to install any GPU-related packages such as Cuda, because there are corresponding methods for installation in the virtual environment. Therefore, this tutorial only requires Anaconda to be installed on your computer.

Computer configuration:
This article is implemented in the ubuntu18.04 + 3080 environment.

Nerf data set download

The original article provides a bash download method, which is not available to most users and is slow at the same time. Use a network disk to download. Network disk link

If you just use Nerf, you can download llff_data. If you use mipnerf, download synthetic. Synthetic is a folder with camera parameters.

After the download is successful, place the data set under the Data folder in the MipNerf folder.

Operation and use of MipNerf

MipNerf article address: MipNerf
MipNerf’s GitHub address: code address
However, due to many problems with the original installation of MipNerf, it was later changed to another address based on other information: a new code address. Subsequent environment configuration and changes were also carried out according to the new code.

Environment installation

The environment installation tutorial has been introduced in detail in MipNerf, but after using it step by step, the following error will appear:

This indicates that the NVIDIA driver installed on the system is an older version and is incompatible with the current PyTorch version. As for the NVIDA of the server, I cannot upgrade it myself, so I consider lowering the PyTorch version.
Use instructions:

nvcc -V

Check that the CUDA version is 11.4:

nvcc: NVIDIA ? Cuda compiler driver
Copyright ? 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

Check the official website for pytorch_lightning1.5. The corresponding pytorch should be ≥1.7, ≤1.10, as follows:

Therefore, install the corresponding Pytorch

Therefore, the instructions need to be changed as follows:

conda create -n mipnerf -y python=3.8
conda activate mipnerf
python -m pip install --upgrade pip
pip install torch==1.9.1 + cu111 torchvision==0.10.1 + cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

After successful installation, run the command:

python train.py --out_dir out_fox --data_path ./data/ima --dataset_name blender exp_name exp

You can see that it has become True on the GPU, indicating that the environment can correctly call the GPU for training.

At this point, the environment is installed correctly.

Question 1

After the installation is successful, the following error occurs during the running process:

AttributeError: module PIL.Image’ has no attribute ANTIALIAS’

This is because ANTIALIAS has been removed from the higher version of Pillow. Check the Pillow version and manually lower it if it is greater than 10:

pip install Pillow==9.5.0

Rerun the training program and it works.

Question 2

RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 11.77 GiB total capacity; 7.96 GiB already allocated; 163.31 MiB free; 8.03 GiB reserved in total by PyTorch)

This is because the GPU memory is insufficient. The memory of 3080 is only 12GB, but a MipNerf training can reach 20-30GB, so this problem occurs. Modify ./configs/lego.yaml code:

seed: 4
num_gpus: 1
exp_name: 'lego'
train:
  batch_size: 1024
  batch_type: 'all_images' # single_image: inputs a full image; all_images: inputs batch_size rays sampled from different image
  num_work: 4
  randomized: True
  white_bkgd: True
val:
  batch_size: 1
  batch_type: 'single_image' # For "single_image", the batch must set to 1
  num_work: 4
  randomized: False
  white_bkgd: True
  check_interval: 10000
  chunk_size: 8192 # The amount of input rays in a forward propagation
  sample_num: 4 # Total number of images verified during once validation

Question 3

ValueError: val_check_interval (10000) must be less than or equal to the number of the training batches (2083). If you want to disable validation set limit_val_batches to 0.0 instead.
This is the interval check performed on the validation set exceeding the number of training batches, modify the code of train.py.

seed: 4
num_gpus: 1
exp_name: 'lego'
train:
  batch_size: 1024
  batch_type: 'all_images' # single_image: inputs a full image; all_images: inputs batch_size rays sampled from different image
  num_work: 4
  randomized: True
  white_bkgd: True
val:
  batch_size: 1
  batch_type: 'single_image' # For "single_image", the batch must set to 1
  num_work: 4
  randomized: False
  white_bkgd: True
  check_interval: 2000
  chunk_size: 8192 # The amount of input rays in a forward propagation
  sample_num: 4 # Total number of images verified during once validation

Then it can run normally, and problems such as insufficient memory are solved here. (But with four 3080s, it seems that it should be able to run the full source code. I don’t know why it keeps prompting that the first block of memory is insufficient. There are still problems that need to be improved.)

I found the problem. In the config/lego.yaml file, there is a sectionnum_gpus:
1This determines the maximum number of GPUs used. Therefore, I only need to change 1 here to the maximum number of GPUs in the server to complete multi-GPU calls. (Note that the modifications here only apply to train.py).

Question 4

After pip installation, there are no packages in conda list.
First check the path of pip:

which pip

/home/XXX/.local/bin/pip

Obviously, this is global pip. What we have to do is to install the package into the virtual environment. This pip is installed under base. A simple method is to use the following command:

python -m pip install name_bao

You can install it in a virtual environment and modify the default path of pip for subsequent updates.

Use your own data set

This chapter makes extensive use of the blogger “3D Diamondback”‘s article: nerf training own data, the process record is very detailed

You can refer to this article for the specific process. This article only improves the error reporting part.

Get JSON file

Just download the COLMAP installation package from the Internet, download link.

Just download and install it. In the second step, you need to download the source code file of instant_ngp. The function here is to generate the json file required by MipNerf. The blogger found that np has a function that provides a method to get the json file from COLMAP, which will be explained hereafter.

instant-ngp source code, after downloading successfully, enter the following command in the terminal:

conda create -n ngp -y python=3.9
condaactivatengp
python =m pip install --upgrade pip
pip install -r requirements.txt

After configuring the np environment, you need to use the function colmap2nerf.py. This function can directly pass in videos or pictures to generate json files.

python scripts/colmap2nerf.py --colmap_matcher exhaustive --run_colmap --aabb_scale 16 --images [image path]

Finally, you will find a transforms.json file in the folder, which is the camera file of the data set.

MipNerf modified part

In the MipNerf code, two places need to be modified, one is needed by train.py, and the other is needed by render_video.py.

train.py

The train command is:

python train.py --out_dir OUT_DIR --data_path UZIP_DATA_DIR --dataset_name blender exp_name EXP_NAME

In the train.py function, the ./configs/XXX.yaml file will be called. This file specifies the number of GPUs used by train, training rounds, and batch size. In order to complete the training faster, I adjusted the format as follows:

Just modify this file, the rest does not need to be modified.

render_video.py

The render file was originally set to use one GPU. I modified it to use multiple GPUs for training. The run_render function code is as follows:

def run_render(args):
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    model = MipNeRFSystem.load_from_checkpoint(args.ckpt).to(device).eval()

    # Use DataParallel to wrap the model to run on four GPUs
    if torch.cuda.device_count() > 1:
        print("Using", torch.cuda.device_count(), "GPUs!")
        model = torch.nn.DataParallel(model, device_ids=[3, 4])
    # Access hparams through the underlying model
    if isinstance(model, torch.nn.DataParallel):
        underlying_model = model.module
        hparams = underlying_model.hparams
    else:
        hparams = model.hparams
    exp_name = hparams['exp_name']

In addition, the camera parameter camera_angle_x needs to be modified. This value comes from camera_angle_x in the json file.

 parser.add_argument('--camera_angle_x', help='camera_angle_x in source dataset',
                        type=float, default=0.9156173236398449)

The json file format is as follows:

Finally, the video file can be generated by running the function. The mov file needs to be converted to mp4 before it can be played under windows.

 python render_video.py --ckpt OUT/ckpt/exp/epoch=0-step=9999.ckpt --out_dir OUT_vidio --scale 4