In the previous section: [3D Image Segmentation] VNet 3D Image Segmentation 6 (data preprocessing) based on Pytorch, we have obtained the same seriesUID
name as the mhd
image mask nrrd
data file, it can be said that there is a one-to-one correspondence.
Moreover, the files of mask
are divided into 4 folders according to how many people marked the nodule at the same time. They were marked one, two, three and four times respectively, for a total of 4
Doctors participated in the annotation.
Coupled with the officially compiled lung parenchyma segmentation documents, we obtained the following data:
ct
Image data;- Lung parenchyma segmentation data;
- Contains
mask
data for nodule locations.
So what does this article do? See the introduction below.
1. Introduction
The above-mentioned ones meet our needs, and they are all corresponding one-to-one. It is very convenient for subsequent data preprocessing or use for training.
However, for the original ct
data, the layer thickness on the Z
axis is different. This can be seen in the dicom
file. You can also query the layer thickness information in the mhd
file. At this point, the differences between different sequences are very large. It is manifested in a 3-dimensional array of nodules, which appear to be squashed and elongated in this dimension.
In the x
and y
directions, there is actually a spacing
difference, but this difference is not as different as the z
axis So exaggerated, you can choose to process or not process it here (some papers are processed, some are not. The default size is 512x512
, and it will become smaller after resample
).
At this point, the purpose of this article is very clear. It is to do the following things:
- Extract lung parenchyma from the original image, and crop the parts outside the lung area, or change them to fixed pixel values;
- Perform
resample
operations on images and nodulesmask
. In this article,zyx
performsresample
operations to1mm.
.
2. Specific implementation
The how-to part is divided into three parts:
- Lung parenchyma cropping
image
andnodule mask
performresample
operations- Get the coordinates and radius of the nodule center point
Let’s expand one by one below
2.1, Main function part
Since the amount of data in this part is relatively large, the multi-process mode is adopted in the main function part to speed up the processing. The data that needs to be read in has been processed in the previous chapter and can be used directly here.
The following is the main function
import sys import numpy as np import scipy.ndimage from skimage import measure, morphology import SimpleITK as sitk from multiprocessing import Pool import os import nrrd ################################################ ############################# # Add the marked mask and CT original image to the left and right lung area segmentation images to generate the CT and nodule mask that removes noise and leaves the lung area. ################################################ ############################# def main(): n_consensus = 4 do_resample = True img_dir = './LUNA16/image_combined' lung_mask_dir = './LUNA16/seg-lungs-LUNA16' nod_mask_dir = os.path.join('./LUNA16/nodule_masks', str(n_consensus)) save_dir = os.path.join('./LUNA16/preprocessed', str(n_consensus)) os.makedirs(save_dir, exist_ok=True) params_lists = [] #Multi-process processing for nrrd_name in os.listdir(nod_mask_dir): # seg-lungs-LUNA16, masks_test/3, seg-lungs-LUNA16, preprocessed_test/3, True pid = nrrd_name[:-5] params_lists.append([pid, lung_mask_dir, nod_mask_dir, img_dir, save_dir, do_resample]) pool = Pool(processes=4) pool.map(cropResample_process, params_lists) pool.close() pool.join() pool = Pool(processes=4) pool.map(generateBBoxes_label, params_lists) pool.close() pool.join() if __name__ == '__main__': main()
There are two parts,
cropResample_process
: As the name suggests, performcrop
andresample
operations of lung parenchyma;generateBBoxes_label
:mask
the processed nodule to obtain the coordinates and radius of the nodule center.
2.2. Lung parenchyma cutting
The steps in this small piece are roughly as follows:
- First, it is data reading. For a detailed introduction to this part, you can refer to my previous article: [Medical Imaging Data Processing] Data format conversion of nii, npz, npy, dcm, mhd, and summary of multi-objective segmentation processing
- Secondly, it is to convert the
hu
value into the value of0-255
, which is the functionHU2uint8()
. For this part, you can refer to < Introduction to the visual part of how the code>hu value is converted into0-255
: [Medical Imaging Data Processing] Summary of Dicom file format processing - In addition, the lung area
mask
is applied to the image, and the lung parenchyma is supplemented bypad valud
- Finally, store the processed
image, mask
and related parameters locally.
The code is as follows. Comment out the parts that should be explained. I believe it can be easily understood.
def load_itk_image(filename): """Return img array and [z,y,x]-ordered origin and spacing """ itkimage = sitk.ReadImage(filename) numpyImage = sitk.GetArrayFromImage(itkimage) numpyOrigin = np.array(list(reversed(itkimage.GetOrigin()))) numpySpacing = np.array(list(reversed(itkimage.GetSpacing()))) return numpyImage, numpyOrigin, numpySpacing def HU2uint8(image, HU_min=-1200.0, HU_max=600.0, HU_nan=-2000.0): """ Convert HU unit into uint8 values. First bound HU values by predfined min and max, and then normalize image: 3D numpy array of raw HU values from CT series in [z, y, x] order. HU_min: float, min HU value. HU_max: float, max HU value. HU_nan: float, value for nan in the raw CT image. """ image_new = np.array(image) image_new[np.isnan(image_new)] = HU_nan # normalize to [0, 1] image_new = (image_new - HU_min) / (HU_max - HU_min) image_new = np.clip(image_new, 0, 1) image_new = (image_new * 255).astype('uint8') return image_new def convex_hull_dilate(binary_mask, dilate_factor=1.5, iterations=10): """ Replace each slice with convex hull of it then dilate. Convex hulls used only if it does not increase area by dilate_factor. This applies mainly to the inferior slices because inferior surface of lungs is concave. binary_mask: 3D binary numpy array with the same shape of the image, that only region of interest is True. One side of the lung in this specific case. dilate_factor: float, factor of increased area after dilation iterations: int, number of iterations for dilation return: 3D binary numpy array with the same shape of the image, that only region of interest is True. Each binary mask is ROI of one side of the lung. """ binary_mask_dilated = np.array(binary_mask) for i in range(binary_mask.shape[0]): slice_binary = binary_mask[i] if np.sum(slice_binary) > 0: slice_convex = morphology.convex_hull_image(slice_binary) if np.sum(slice_convex) <= dilate_factor * np.sum(slice_binary): binary_mask_dilated[i] = slice_convex struct = scipy.ndimage.generate_binary_structure(3, 1) binary_mask_dilated = scipy.ndimage.binary_dilation( binary_mask_dilated, structure=struct, iterations=10) return binary_mask_dilated def apply_lung_mask(image, binary_mask1, binary_mask2, pad_value=170, bone_thred=210, remove_bone=False): """ Apply the binary mask of each lung to the image. Regions out of interest are replaced with pad_value. image: 3D uint8 numpy array with the same shape of the image. binary_mask1: 3D binary numpy array with the same shape of the image, that only one side of lung is True. binary_mask2: 3D binary numpy array with the same shape of the image, that only the other side of lung is True. pad_value: int, uint8 value for padding image regions that is not interested. bone_thred: int, uint8 threahold value for determining parts of image is bone. return: D uint8 numpy array with the same shape of the image after applying the lung mask. """ binary_mask = binary_mask1 + binary_mask2 binary_mask1_dilated = convex_hull_dilate(binary_mask1) binary_mask2_dilated = convex_hull_dilate(binary_mask2) binary_mask_dilated = binary_mask1_dilated + binary_mask2_dilated binary_mask_extra = binary_mask_dilated ^ binary_mask # replace image values outside binary_mask_dilated as pad value image_new = image * binary_mask_dilated + pad_value * (1 - binary_mask_dilated).astype('uint8') # set bones in extra mask to 170 (ie convert HU > 482 to HU 0; #water). if remove_bone: image_new[image_new * binary_mask_extra > bone_thred] = pad_value return image_new def cropResample_process(params): # seg-lungs-LUNA16, masks_test/3, seg-lungs-LUNA16, preprocessed_test/3, True pid, lung_mask_dir, nod_mask_dir, img_dir, save_dir, do_resample = params print('Preprocessing %s...' % (pid)) img_org, origin, spacing = load_itk_image(os.path.join(img_dir, '%s.mhd' % (pid))) lung_mask, _, _ = load_itk_image(os.path.join(lung_mask_dir, '%s.mhd' % (pid))) nodule_mask, _ = nrrd.read(os.path.join(nod_mask_dir, '%s.nrrd' % (pid))) # 4-right lung 3-left lung 5-trachea binary_mask_r = lung_mask == 4 binary_mask_l = lung_mask == 3 binary_mask = binary_mask_r + binary_mask_l img_org = HU2uint8(img_org) img_lungRL = apply_lung_mask(img_org, binary_mask_r, binary_mask_l)
There is one point that has never been explained before, that is the officially provided lung mask
array, which is briefly recorded here:
- The number 3 represents the left lung
- The number 4 represents the right lung
- The number 5 represents the trachea
This is the first time I saw this bitwise XOR operator (^
), and I briefly studied it:
The bitwise exclusive OR operator (^) is used to perform a logical exclusive OR operation on each corresponding bit of the two operands. If the two corresponding bits have the same value, the result is 0, otherwise it is 1. The essence of XOR is addition without carry.
Use the truth table to look at the operation logic of XOR:
It seems very intuitive and straightforward. But what happens if we replace the XOR between boolean values with the XOR between numbers? ? ? Let’s try it out:
0 ^ 0 0^1 1^0 1^1 >>> 0 1 1 0
The results tell us: The XOR value of numbers the same is 0; the XOR value of numbers not the same is 1.
Let’s try numbers other than 0 and 1 again:
5 ^ 3 >>>6
Damn it? Why is the answer 6? Shouldn’t it be 1? ? What’s going on behind the scenes?
XOR is the result of bitwise XOR based on binary, 5 ^ 3. The process actually converts 5 and 3 into binary respectively:
5 = 0101(b) 3 = 0011(b)
Bitwise XOR:
0^0 ->0 1^0->1 0^1 ->1 1^1->0
The arrangement is 0110(b)
converted to decimal: 6. For more information on XOR, please refer here: A brief discussion of Python logical operators XOR
Execute the XOR operation between the dilate
expanded binary mask
and the original binary mask
. The values of the corresponding bits are the same, and the result is 0
, otherwise 1
. Then, the result obtained is the expanded part, which is bone
. This part is used in the bone
removal stage.
You may have this question: Why not directly multiply image
and lung mask
to get a left after dividing the lung parenchyma? What about image
? Instead, we need to use convex hull optimization. Is it unnecessary?
Answer: In lung mask
, there is an error in the segmentation of lung parenchyma. That is to say, the segmentation of the lung parenchyma is along the edge of the lung area, but the location of some nodules is just on the boundary of the lung area and is very dense. Then mask
will appear in a concave state. If you use the above method, the nodules will be picked out. Using convex hull optimization, you can slightly expand the edge of the lung parenchyma to achieve the effect of leaving more lung area.
However, for diseases with large lesions such as tuberculosis, the above method of removing lung parenchyma is not feasible. The main reason is that the range of pulmonary tuberculosis is relatively large. Even though convex hull optimization is used, a large part of the lung area will eventually be removed, leaving the lung area incomplete and causing some losses.
The following is an example officially given by skimage.morphology.convex_hull_image
, as follows: Click here to go directly
is applied to our project. The cut appearance is as follows:
2.3, resample operation
In the operation of resample
in this article, all dimensions of zyx
are evenly exposed to rain and dew, and all are adjusted to the state of 1mm
. In this way, a Pixel size represents the physical size and will not cause deformation in any dimension.
The code looks like this:
def resample(image, spacing, new_spacing=[1.0, 1.0, 1.0], order=1): """ Resample image from the original spacing to new_spacing, e.g. 1x1x1 image: 3D numpy array of raw HU values from CT series in [z, y, x] order. spacing: float * 3, raw CT spacing in [z, y, x] order. new_spacing: float * 3, new spacing used for resample, typically 1x1x1, which means standardizing the raw CT with different spacing all into 1x1x1mm. order: int, order for resample function scipy.ndimage.interpolation.zoom return: 3D binary numpy array with the same shape of the image after, resampling. The actual resampling spacing is also returned. """ # shape can only be int, so has to be rounded. new_shape = np.round(image.shape * spacing / new_spacing) # the actual spacing to resample. resample_spacing = spacing * image.shape / new_shape resize_factor = new_shape/image.shape image_new = scipy.ndimage.zoom(image, resize_factor, mode='nearest', order=order) return (image_new, resample_spacing) if do_resample: print('Resampling...') img_lungRL, resampled_spacing = resample(img_lungRL, spacing, order=3) seg_nod_mask = np.zeros(img_lungRL.shape, dtype=np.uint8) for i in range(int(nodule_mask.max())): # A nodule, a resample of a nodule mask = (nodule_mask == (i + 1)).astype(np.uint8) mask, _ = resample(mask, spacing, order=3) seg_nod_mask[mask > 0.5] = i + 1
Among them, in the resample
function, the scipy.ndimage.zoom
operation is used to directly transfer the original data, zoom
to the new shape
.
scipy.ndimage.zoom(input, zoom, output=None, order=3, mode='constant', cval=0.0, prefilter=True, *, grid_mode=False)[source]
In the function:
input
:The input arrayzoom
: The zoom factor along the axes
The following is an official case, showing the changes before and after zoom
. You can refer to: Click the link to go directly
from scipy import ndimage, datasets import matplotlib.pyplot as plt fig = plt.figure() ax1 = fig.add_subplot(121) # left side ax2 = fig.add_subplot(122) # right side ascent = datasets.ascent() result = ndimage.zoom(ascent, 3.0) ax1.imshow(ascent, vmin=0, vmax=255) ax2.imshow(result, vmin=0, vmax=255) plt.show()
The changes before and after zoom
are as follows:
I found that this scipy
library is really useful, and I will find time to comprehensively supplement this knowledge in the future.
2.4. Store locally
This part is relatively simple. It mainly talks about some new aspects of array storage:
- The
npy
file stores some simple arrays, such asspacing
below, coordinates, etc.; - The
nrrd
file stores multi-dimensional arrays, such as the followingimage
andmask
array images, the size is240x320x320
;
I used to like to usenii
to store files, but now I find it is not easy to use.nrrd
can also store arrays andheader
headers.
Here is the code:
lung_box = get_lung_box(binary_mask, img_lungRL.shape) # Get the outer contour of the lung segmentation z_min, z_max = lung_box[0] y_min, y_max = lung_box[1] x_min, x_max = lung_box[2] # Cropping operation, remove outside the lung area img_lungRL = img_lungRL[z_min:z_max, y_min:y_max, x_min:x_max] if do_resample: seg_nod_mask = seg_nod_mask[z_min:z_max, y_min:y_max, x_min:x_max] else: seg_nod_mask = nodule_mask[z_min:z_max, y_min:y_max, x_min:x_max] np.save(os.path.join(save_dir, '%s_origin.npy' % (pid)), origin) # origin (3,) Record the origin coordinate information of the three-dimensional image if do_resample: np.save(os.path.join(save_dir, '%s_spacing.npy' % (pid)), resampled_spacing) # Record the scale coefficients of the three dimensions of x\y\z before and after resample np.save(os.path.join(save_dir, '%s_ebox_origin.npy' % (pid)), np.array((z_min, y_min, x_min))) nrrd.write(os.path.join(save_dir, '%s_clean.nrrd' % (pid)), img_lungRL) # CT image after removing non-lung areas nrrd.write(os.path.join(save_dir, '%s_mask.nrrd' % (pid)), seg_nod_mask) # Nodule MASK image after removing non-lung areas
2.5. Obtain the coordinates and radius of the nodule center point
The center point coordinates and radius of the marked nodule are obtained here. The purpose is to be able to directly take it from the obtained nodule and perform crop
directly during operations such as cropping patch
>Operation.
The steps in this step are similar to the previous get_lung_box
. The only difference is that the center point is saved instead of the maximum and minimum boundary coordinates above.
code show as below:
def generateBBoxes_label(params): pid, lung_mask_dir, nod_mask_dir, img_dir, save_dir, do_resample = params masks, _ = nrrd.read(os.path.join(save_dir, '%s_mask.nrrd' % (pid))) bboxes = [] instance_nums = [num for num in np.unique(masks) if num] for i in instance_nums: mask = (masks == i).astype(np.uint8) zz, yy, xx = np.where(mask) d = max(zz.max() - zz.min() + 1, yy.max() - yy.min() + 1, xx.max() - xx.min() + 1) bboxes.append(np.array([(zz.max() + zz.min())) / 2., (yy.max() + yy.min()) / 2., (xx.max() + xx.min()) / 2., d])) bboxes = np.array(bboxes) if not len(bboxes): print('%s does not have any nodules!!!' % (pid)) print('Finished masks to bboxes %s' % (pid)) np.save(os.path.join(save_dir, '%s_bboxes.npy' % (pid)), bboxes)
3. Summary
At this point, the content of this article, combined with the content of the previous article, we have basically completed the data processing of Luna16
, and also completed the content we originally hoped to get:
images
andmask
arrays have one-to-one correspondence between file names;resample
operates to1mm
;- The portion outside the lung parenchyma was discarded;
The two chapters 6 and 7 are supplements to the data part of the previous chapters. You can refer to these two chapters for data processing, or refer to other data processing. As long as the final data form is the same. .