[3D image segmentation] 3D image segmentation based on Pytorch 7 (data preprocessing: resample data to 1mm unified scale)

In the previous section: [3D Image Segmentation] VNet 3D Image Segmentation 6 (data preprocessing) based on Pytorch, we have obtained the same seriesUID name as the mhd image mask nrrd data file, it can be said that there is a one-to-one correspondence.

Moreover, the files of mask are divided into 4 folders according to how many people marked the nodule at the same time. They were marked one, two, three and four times respectively, for a total of 4 Doctors participated in the annotation.

Coupled with the officially compiled lung parenchyma segmentation documents, we obtained the following data:

ctImage data;
Lung parenchyma segmentation data;
Contains mask data for nodule locations.

So what does this article do? See the introduction below.

1. Introduction

The above-mentioned ones meet our needs, and they are all corresponding one-to-one. It is very convenient for subsequent data preprocessing or use for training.

However, for the original ct data, the layer thickness on the Z axis is different. This can be seen in the dicom file. You can also query the layer thickness information in the mhd file. At this point, the differences between different sequences are very large. It is manifested in a 3-dimensional array of nodules, which appear to be squashed and elongated in this dimension.

In the x and y directions, there is actually a spacing difference, but this difference is not as different as the z axis So exaggerated, you can choose to process or not process it here (some papers are processed, some are not. The default size is 512x512, and it will become smaller after resample).

At this point, the purpose of this article is very clear. It is to do the following things:

Extract lung parenchyma from the original image, and crop the parts outside the lung area, or change them to fixed pixel values;
Perform resample operations on images and nodules mask. In this article, zyx performs resample operations to 1mm. .

2. Specific implementation

The how-to part is divided into three parts:

Lung parenchyma cropping
image and nodule mask perform resample operations
Get the coordinates and radius of the nodule center point

Let’s expand one by one below

2.1, Main function part

Since the amount of data in this part is relatively large, the multi-process mode is adopted in the main function part to speed up the processing. The data that needs to be read in has been processed in the previous chapter and can be used directly here.

The following is the main function

import sys
import numpy as np
import scipy.ndimage
from skimage import measure, morphology
import SimpleITK as sitk
from multiprocessing import Pool
import os
import nrrd

################################################ #############################
# Add the marked mask and CT original image to the left and right lung area segmentation images to generate the CT and nodule mask that removes noise and leaves the lung area.
################################################ #############################
def main():
    n_consensus = 4
    do_resample = True
    img_dir = './LUNA16/image_combined'
    lung_mask_dir = './LUNA16/seg-lungs-LUNA16'
    nod_mask_dir = os.path.join('./LUNA16/nodule_masks', str(n_consensus))

    save_dir = os.path.join('./LUNA16/preprocessed', str(n_consensus))
    os.makedirs(save_dir, exist_ok=True)

    params_lists = []
    #Multi-process processing
    for nrrd_name in os.listdir(nod_mask_dir):
        # seg-lungs-LUNA16, masks_test/3, seg-lungs-LUNA16, preprocessed_test/3, True
        pid = nrrd_name[:-5]
        params_lists.append([pid, lung_mask_dir, nod_mask_dir, img_dir, save_dir, do_resample])

    pool = Pool(processes=4)
    pool.map(cropResample_process, params_lists)

    pool.close()
    pool.join()

    pool = Pool(processes=4)
    pool.map(generateBBoxes_label, params_lists)

    pool.close()
    pool.join()

if __name__ == '__main__':
    main()

There are two parts,

cropResample_process: As the name suggests, perform crop and resample operations of lung parenchyma;
generateBBoxes_label: mask the processed nodule to obtain the coordinates and radius of the nodule center.

2.2. Lung parenchyma cutting

The steps in this small piece are roughly as follows:

First, it is data reading. For a detailed introduction to this part, you can refer to my previous article: [Medical Imaging Data Processing] Data format conversion of nii, npz, npy, dcm, mhd, and summary of multi-objective segmentation processing
Secondly, it is to convert the hu value into the value of 0-255, which is the function HU2uint8(). For this part, you can refer to < Introduction to the visual part of how the code>hu value is converted into 0-255: [Medical Imaging Data Processing] Summary of Dicom file format processing
In addition, the lung area mask is applied to the image, and the lung parenchyma is supplemented by pad valud
Finally, store the processed image, mask and related parameters locally.

The code is as follows. Comment out the parts that should be explained. I believe it can be easily understood.

def load_itk_image(filename):
    """Return img array and [z,y,x]-ordered origin and spacing
    """
    itkimage = sitk.ReadImage(filename)
    numpyImage = sitk.GetArrayFromImage(itkimage)

    numpyOrigin = np.array(list(reversed(itkimage.GetOrigin())))
    numpySpacing = np.array(list(reversed(itkimage.GetSpacing())))

    return numpyImage, numpyOrigin, numpySpacing

def HU2uint8(image, HU_min=-1200.0, HU_max=600.0, HU_nan=-2000.0):
    """
    Convert HU unit into uint8 values. First bound HU values by predfined min
    and max, and then normalize
    image: 3D numpy array of raw HU values from CT series in [z, y, x] order.
    HU_min: float, min HU value.
    HU_max: float, max HU value.
    HU_nan: float, value for nan in the raw CT image.
    """
    image_new = np.array(image)
    image_new[np.isnan(image_new)] = HU_nan

    # normalize to [0, 1]
    image_new = (image_new - HU_min) / (HU_max - HU_min)
    image_new = np.clip(image_new, 0, 1)
    image_new = (image_new * 255).astype('uint8')

    return image_new

def convex_hull_dilate(binary_mask, dilate_factor=1.5, iterations=10):
    """
    Replace each slice with convex hull of it then dilate. Convex hulls used
    only if it does not increase area by dilate_factor. This applies mainly to
    the inferior slices because inferior surface of lungs is concave.
    binary_mask: 3D binary numpy array with the same shape of the image,
        that only region of interest is True. One side of the lung in this
        specific case.
    dilate_factor: float, factor of increased area after dilation
    iterations: int, number of iterations for dilation
    return: 3D binary numpy array with the same shape of the image,
        that only region of interest is True. Each binary mask is ROI of one
        side of the lung.
    """
    binary_mask_dilated = np.array(binary_mask)
    for i in range(binary_mask.shape[0]):
        slice_binary = binary_mask[i]

        if np.sum(slice_binary) > 0:
            slice_convex = morphology.convex_hull_image(slice_binary)

            if np.sum(slice_convex) <= dilate_factor * np.sum(slice_binary):
                binary_mask_dilated[i] = slice_convex

    struct = scipy.ndimage.generate_binary_structure(3, 1)
    binary_mask_dilated = scipy.ndimage.binary_dilation(
        binary_mask_dilated, structure=struct, iterations=10)

    return binary_mask_dilated

def apply_lung_mask(image, binary_mask1, binary_mask2, pad_value=170,
                    bone_thred=210, remove_bone=False):
    """
    Apply the binary mask of each lung to the image. Regions out of interest
    are replaced with pad_value.
    image: 3D uint8 numpy array with the same shape of the image.
    binary_mask1: 3D binary numpy array with the same shape of the image,
        that only one side of lung is True.
    binary_mask2: 3D binary numpy array with the same shape of the image,
        that only the other side of lung is True.
    pad_value: int, uint8 value for padding image regions that is not
        interested.
    bone_thred: int, uint8 threahold value for determining parts of image is
        bone.
    return: D uint8 numpy array with the same shape of the image after
        applying the lung mask.
    """
    binary_mask = binary_mask1 + binary_mask2
    binary_mask1_dilated = convex_hull_dilate(binary_mask1)
    binary_mask2_dilated = convex_hull_dilate(binary_mask2)
    binary_mask_dilated = binary_mask1_dilated + binary_mask2_dilated
    binary_mask_extra = binary_mask_dilated ^ binary_mask

    # replace image values outside binary_mask_dilated as pad value
    image_new = image * binary_mask_dilated + pad_value * (1 - binary_mask_dilated).astype('uint8')

    # set bones in extra mask to 170 (ie convert HU > 482 to HU 0;
    #water).
    if remove_bone:
        image_new[image_new * binary_mask_extra > bone_thred] = pad_value

    return image_new

def cropResample_process(params):
    # seg-lungs-LUNA16, masks_test/3, seg-lungs-LUNA16, preprocessed_test/3, True
    pid, lung_mask_dir, nod_mask_dir, img_dir, save_dir, do_resample = params

    print('Preprocessing %s...' % (pid))

    img_org, origin, spacing = load_itk_image(os.path.join(img_dir, '%s.mhd' % (pid)))
    lung_mask, _, _ = load_itk_image(os.path.join(lung_mask_dir, '%s.mhd' % (pid)))
    nodule_mask, _ = nrrd.read(os.path.join(nod_mask_dir, '%s.nrrd' % (pid)))

    # 4-right lung 3-left lung 5-trachea
    binary_mask_r = lung_mask == 4
    binary_mask_l = lung_mask == 3
    binary_mask = binary_mask_r + binary_mask_l

    img_org = HU2uint8(img_org)
    img_lungRL = apply_lung_mask(img_org, binary_mask_r, binary_mask_l)

There is one point that has never been explained before, that is the officially provided lung mask array, which is briefly recorded here:

The number 3 represents the left lung
The number 4 represents the right lung
The number 5 represents the trachea

This is the first time I saw this bitwise XOR operator (^), and I briefly studied it:

The bitwise exclusive OR operator (^) is used to perform a logical exclusive OR operation on each corresponding bit of the two operands. If the two corresponding bits have the same value, the result is 0, otherwise it is 1. The essence of XOR is addition without carry.

Use the truth table to look at the operation logic of XOR:

It seems very intuitive and straightforward. But what happens if we replace the XOR between boolean values with the XOR between numbers? ? ? Let’s try it out:

0 ^ 0
0^1
1^0
1^1

>>>
0
1
1
0

The results tell us: The XOR value of numbers the same is 0; the XOR value of numbers not the same is 1.

Let’s try numbers other than 0 and 1 again:

5 ^ 3
>>>6

Damn it? Why is the answer 6? Shouldn’t it be 1? ? What’s going on behind the scenes?

XOR is the result of bitwise XOR based on binary, 5 ^ 3. The process actually converts 5 and 3 into binary respectively:

5 = 0101(b)

3 = 0011(b)

Bitwise XOR:

0^0 ->0

1^0->1

0^1 ->1

1^1->0

The arrangement is 0110(b) converted to decimal: 6. For more information on XOR, please refer here: A brief discussion of Python logical operators XOR

Execute the XOR operation between the dilate expanded binary mask and the original binary mask. The values of the corresponding bits are the same, and the result is 0 , otherwise 1. Then, the result obtained is the expanded part, which is bone. This part is used in the bone removal stage.

You may have this question: Why not directly multiply image and lung mask to get a left after dividing the lung parenchyma? What about image? Instead, we need to use convex hull optimization. Is it unnecessary?

Answer: In lung mask, there is an error in the segmentation of lung parenchyma. That is to say, the segmentation of the lung parenchyma is along the edge of the lung area, but the location of some nodules is just on the boundary of the lung area and is very dense. Then mask will appear in a concave state. If you use the above method, the nodules will be picked out. Using convex hull optimization, you can slightly expand the edge of the lung parenchyma to achieve the effect of leaving more lung area.

However, for diseases with large lesions such as tuberculosis, the above method of removing lung parenchyma is not feasible. The main reason is that the range of pulmonary tuberculosis is relatively large. Even though convex hull optimization is used, a large part of the lung area will eventually be removed, leaving the lung area incomplete and causing some losses.

The following is an example officially given by skimage.morphology.convex_hull_image, as follows: Click here to go directly
is applied to our project. The cut appearance is as follows:

2.3, resample operation

In the operation of resample in this article, all dimensions of zyx are evenly exposed to rain and dew, and all are adjusted to the state of 1mm. In this way, a Pixel size represents the physical size and will not cause deformation in any dimension.

The code looks like this:

def resample(image, spacing, new_spacing=[1.0, 1.0, 1.0], order=1):
    """
    Resample image from the original spacing to new_spacing, e.g. 1x1x1
    image: 3D numpy array of raw HU values from CT series in [z, y, x] order.
    spacing: float * 3, raw CT spacing in [z, y, x] order.
    new_spacing: float * 3, new spacing used for resample, typically 1x1x1,
        which means standardizing the raw CT with different spacing all into
        1x1x1mm.
    order: int, order for resample function scipy.ndimage.interpolation.zoom
    return: 3D binary numpy array with the same shape of the image after,
        resampling. The actual resampling spacing is also returned.
    """
    # shape can only be int, so has to be rounded.
    new_shape = np.round(image.shape * spacing / new_spacing)

    # the actual spacing to resample.
    resample_spacing = spacing * image.shape / new_shape

    resize_factor = new_shape/image.shape

    image_new = scipy.ndimage.zoom(image, resize_factor, mode='nearest', order=order)

    return (image_new, resample_spacing)

if do_resample:
    print('Resampling...')
    img_lungRL, resampled_spacing = resample(img_lungRL, spacing, order=3)
    seg_nod_mask = np.zeros(img_lungRL.shape, dtype=np.uint8)
    for i in range(int(nodule_mask.max())):
        # A nodule, a resample of a nodule
        mask = (nodule_mask == (i + 1)).astype(np.uint8)
        mask, _ = resample(mask, spacing, order=3)
        seg_nod_mask[mask > 0.5] = i + 1

Among them, in the resample function, the scipy.ndimage.zoom operation is used to directly transfer the original data, zoom to the new shape.

scipy.ndimage.zoom(input, zoom, output=None, order=3, mode='constant', cval=0.0, prefilter=True, *, grid_mode=False)[source]

In the function:

input：The input array
zoom: The zoom factor along the axes

The following is an official case, showing the changes before and after zoom. You can refer to: Click the link to go directly

from scipy import ndimage, datasets
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(121) # left side
ax2 = fig.add_subplot(122) # right side
ascent = datasets.ascent()
result = ndimage.zoom(ascent, 3.0)
ax1.imshow(ascent, vmin=0, vmax=255)
ax2.imshow(result, vmin=0, vmax=255)
plt.show()

The changes before and after zoom are as follows:

I found that this scipy library is really useful, and I will find time to comprehensively supplement this knowledge in the future.

2.4. Store locally

This part is relatively simple. It mainly talks about some new aspects of array storage:

The npy file stores some simple arrays, such as spacing below, coordinates, etc.;
The nrrd file stores multi-dimensional arrays, such as the following image and mask array images, the size is 240x320x320;
I used to like to use nii to store files, but now I find it is not easy to use. nrrd can also store arrays and header headers.

Here is the code:

lung_box = get_lung_box(binary_mask, img_lungRL.shape) # Get the outer contour of the lung segmentation

z_min, z_max = lung_box[0]
y_min, y_max = lung_box[1]
x_min, x_max = lung_box[2]

# Cropping operation, remove outside the lung area
img_lungRL = img_lungRL[z_min:z_max, y_min:y_max, x_min:x_max]
if do_resample:
    seg_nod_mask = seg_nod_mask[z_min:z_max, y_min:y_max, x_min:x_max]
else:
    seg_nod_mask = nodule_mask[z_min:z_max, y_min:y_max, x_min:x_max]

np.save(os.path.join(save_dir, '%s_origin.npy' % (pid)), origin) # origin (3,) Record the origin coordinate information of the three-dimensional image
if do_resample:
    np.save(os.path.join(save_dir, '%s_spacing.npy' % (pid)), resampled_spacing) # Record the scale coefficients of the three dimensions of x\y\z before and after resample
np.save(os.path.join(save_dir, '%s_ebox_origin.npy' % (pid)), np.array((z_min, y_min, x_min)))

nrrd.write(os.path.join(save_dir, '%s_clean.nrrd' % (pid)), img_lungRL) # CT image after removing non-lung areas
nrrd.write(os.path.join(save_dir, '%s_mask.nrrd' % (pid)), seg_nod_mask) # Nodule MASK image after removing non-lung areas

2.5. Obtain the coordinates and radius of the nodule center point

The center point coordinates and radius of the marked nodule are obtained here. The purpose is to be able to directly take it from the obtained nodule and perform crop directly during operations such as cropping patch >Operation.

The steps in this step are similar to the previous get_lung_box. The only difference is that the center point is saved instead of the maximum and minimum boundary coordinates above.

code show as below:

def generateBBoxes_label(params):
    pid, lung_mask_dir, nod_mask_dir, img_dir, save_dir, do_resample = params
    masks, _ = nrrd.read(os.path.join(save_dir, '%s_mask.nrrd' % (pid)))

    bboxes = []
    instance_nums = [num for num in np.unique(masks) if num]
    for i in instance_nums:
        mask = (masks == i).astype(np.uint8)
        zz, yy, xx = np.where(mask)
        d = max(zz.max() - zz.min() + 1, yy.max() - yy.min() + 1, xx.max() - xx.min() + 1)
        bboxes.append(np.array([(zz.max() + zz.min())) / 2., (yy.max() + yy.min()) / 2., (xx.max() + xx.min()) / 2., d]))

    bboxes = np.array(bboxes)
    if not len(bboxes):
        print('%s does not have any nodules!!!' % (pid))

    print('Finished masks to bboxes %s' % (pid))
    np.save(os.path.join(save_dir, '%s_bboxes.npy' % (pid)), bboxes)

3. Summary

At this point, the content of this article, combined with the content of the previous article, we have basically completed the data processing of Luna16, and also completed the content we originally hoped to get:

images and mask arrays have one-to-one correspondence between file names;
resample operates to 1mm;
The portion outside the lung parenchyma was discarded;

The two chapters 6 and 7 are supplements to the data part of the previous chapters. You can refer to these two chapters for data processing, or refer to other data processing. As long as the final data form is the same. .