Since the previous LUNA16
data processing method compiled by the author of Bizhan was too cumbersome, this article will make a new sorting of the LUNA16
data. The final data and form are similar. of. However, the main difference is that the code logic is relatively simple and easy to understand.
For learning about the LUNA16
data set, you can refer here: [3D Image Classification] Pytorch-based 3D stereoscopic image classification 3 (LIDC-IDRI pulmonary nodule XML feature tag PKL dump)
The main steps and central content of this article include the following parts:
masks
generation: extract the nodule mark position coordinates of the corresponding sequenceseries
from thexml
file (a nodule may be marked multiple times by multiple people) ), generate the correspondingmask
array file, the size is consistent with the image array size;- Lung parenchyma extraction operation: From the lung area segmentation data, perform a product operation with the original image and the
mask
image, and fill or remove the non-lung area parts; resample
operation: According tospacing
, performresample
operation.resample< can be performed in three dimensions of
zyx
/code>, you can also just perform theresample
operation in thez
direction at1mm
(I saw something similar to this in the paper) ;- According to
mask
, obtain thezyx
center point coordinates and radius of the nodule.
At this point, we will have the following files:
- Contains image data of
ct
; - Corresponding
mask
data; - A file that records
zyx
center point coordinates and radius.
Compared with the data format given by luna16
, the current data is easier to understand and easier to view. Whether it is visualization or subsequent data processing and training, it is more intuitive and clear. This part will be expanded on one by one later.
Since the amount of code is still relatively large, there are many things to deal with, and there are many files involved, so it may be spread out in several chapters. In this article, we will first process the xml file
and transfer it out for easy viewing. This involves the format and processing of xml
files, so I will write a separate article and refer to the link: [Medical Imaging Data Processing] XML file format processing summary
1. xml file dump
1.1. Understanding the annotation file xml
For an introduction to what each field in the xml
file means in the LIDC-IDRI
data set, you can refer to my other article, click here: [LIDC-IDRI] CT Pulmonary nodule XML tag characteristics benign and malignant tag PKL dump (1)
In this article, we focus on the structure of this data and what the tag
of each record in xml
means. I believe that after reading this, you will have a deeper understanding of the processing of this data set.
Most of the code is the same as the content introduced and obtained in the link above. You can refer to this GitHub: NoduleNet – utils -LIDC
Some content has not been introduced, so I will simply make a supplement.
ResponseHeader
: This is the header part, which records the information of this case (that is, the CT image of a single patient).
In order to facilitate viewing and learning of xml
files, you can refer to this article: [Medical Imaging Data Processing] Summary of XML file format processing. We will use the xml
to convert it to a dictionary to facilitate our viewing. The following shows the comparison of the before and after conversion to the dictionary, as follows:
The data form of the original xml
is excerpted from a small section and is shown below:
<?xml version="1.0" encoding="UTF-8"?> <LidcReadMessage uid="1.3.6.1.4.1.14519.5.2.1.6279.6001.1308168927505.0" xmlns="http://www.nih.gov" xmlns:xsi="http://www.w3. org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd"> <ResponseHeader> <Version>1.7</Version> <MessageId>1148851</MessageId> <DateRequest>2005-11-03</DateRequest> <TimeRequest>12:25:10</TimeRequest> <RequestingSite>removed</RequestingSite> <ServicingSite>removed</ServicingSite> <TaskDescription>Second unblinded read</TaskDescription> <CtImageFile>removed</CtImageFile> <SeriesInstanceUid>1.3.6.1.4.1.14519.5.2.1.6279.6001.131939324905446238286154504249</SeriesInstanceUid> <StudyInstanceUID>1.3.6.1.4.1.14519.5.2.1.6279.6001.303241414168367763244410429787</StudyInstanceUID> <DateService>2005-11-03</DateService> <TimeService>12:25:40</TimeService> <ResponseDescription>1 - Reading complete</ResponseDescription> <ResponseComments></ResponseComments> </ResponseHeader>
Convert to dictionary
dictionary form. (Easier to view)
{ "LidcReadMessage": { "@uid": "1.3.6.1.4.1.14519.5.2.1.6279.6001.1308168927505.0", "@xmlns": "http://www.nih.gov", "@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance", "@xsi:schemaLocation": "http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd", "ResponseHeader": { "Version": "1.7", "MessageId": "1148851", "DateRequest": "2005-11-03", "TimeRequest": "12:25:10", "RequestingSite": "removed", "ServicingSite": "removed", "TaskDescription": "Second unblinded read", "CtImageFile": "removed", "SeriesInstanceUid": "1.3.6.1.4.1.14519.5.2.1.6279.6001.131939324905446238286154504249", "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6279.6001.303241414168367763244410429787", "DateService": "2005-11-03", "TimeService": "12:25:40", "ResponseDescription": "1 - Reading complete", "ResponseComments": null }, }
1.2. Convert xml comprehensive records to series-based npy files
LIDC-IDRI
has 1018
checks, 6
folders in the tagged folder tcia-lidc-xml
, there are 1318
xml
files. Moreover, the names of these xml
files are not in one-to-one correspondence with the sequence names of the images.
Therefore, it is necessary to reorganize the information marked in the xml
file and convert it into content that people can easily understand and understand. Moreover, if the annotation file can have a one-to-one correspondence with the image file, subsequent processing will be much easier.
What this section does is to extract the xml
file and leave the content you care about, leaving other unimportant and unconcerned content aside for the time being.
Below is the processing code, the main steps are outlined below:
- Traverse all
xml
files and process them one by one; - For a single
xml
file, parse out theseriesuid
and the labeled nodule coordinates; - Stored to the
npy
file named withseriesuid
, the stored content is the coordinates of each nodule.
The complete code is as follows:
from tqdm import tqdm importsys import os import numpy as np from pylung.utils import find_all_files from pylung.annotation import parse def xml2mask(xml_file): header, annos = parse(xml_file) # get one xml info ctr_arrs = [] for i, reader in enumerate(annos): for j, nodule in enumerate(reader.nodules): ctr_arr = [] for k, roi in enumerate(nodule.rois): z = roi.z for roi_xy in roi.roi_xy: ctr_arr.append([z, roi_xy[1], roi_xy[0]]) # [[[z, y, x], [z, y, x]]] ctr_arrs.append(ctr_arr) seriesuid = header.series_instance_uid return seriesuid, ctr_arrs def annotation2masks(annos_dir, save_dir): # get all xml file path files = find_all_files(annos_dir, '.xml') for f in tqdm(files, total=len(files)): print(f) try: seriesuid, masks = xml2mask(f) np.save(os.path.join(save_dir, '%s' % (seriesuid)), masks) # save xml 3D coor [[z, y, x], [z, y, x]] except: print("Unexpected error:", sys.exc_info()[0]) if __name__ == '__main__': annos_dir = './LUNA16/annotation/LIDC-XML-only/tcia-lidc-xml' # .xml ctr_arr_save_dir = './LUNA16/annotation/noduleCoor' # Where to save the intermediate nodule mask parsed by each annotator os.makedirs(ctr_arr_save_dir, exist_ok=True) # xml information, dump npy (temporary file) annotation2masks(annos_dir, ctr_arr_save_dir)
Next, open an npy file for viewing. The recorded content is as follows, which are the polygon
coordinate points of all nodules marked by all doctors in this sequence:
[list([[-299.8, 206, 42], [-299.8, 207, 41], [-299.8, 208, 41], [-299.8, 209, 40], [-299.8, 210, 40 ], [-299.8, 211, 41], [-299.8, 212, 41], [-299.8, 213, 42], [-299.8, 214, 42], [-299.8, 215, 43], [-299.8 , 216, 44], [-299.8, 216, 45], [-299.8, 215, 46], [-299.8, 215, 47], [-299.8, 215, 48], [-299.8, 214, 49] , [-299.8, 213, 49], [-299.8, 212, 49], [-299.8, 211, 49], [-299.8, 210, 49], [-299.8, 209, 49], [-299.8, 208, 48], [-299.8, 207, 47], [-299.8, 207, 46], [-299.8, 206, 45], [-299.8, 206, 44], [-299.8, 206, 43], [-299.8, 206, 42], [-298.0, 206, 46], [-298.0, 207, 45], [-298.0, 207, 44], [-298.0, 208, 43], [-298.0, 209 , 42], [-298.0, 209, 41], [-298.0, 210, 40], [-298.0, 211, 40], [-298.0, 212, 39], [-298.0, 213, 40], [ -298.0, 214, 41], [-298.0, 215, 42], [-298.0, 215, 43], [-298.0, 216, 44], [-298.0, 216, 45], [-298.0, 216, 46], [-298.0, 216, 47], [-298.0, 215, 48], [-298.0, 214, 48], [-298.0, 213, 48], [-298.0, 212, 48], [- 298.0, 211, 48], [-298.0, 210, 48], [-298.0, 209, 48], [-298.0, 208, 48], [-298.0, 207, 47], [-298.0, 206, 46 ], [-296.2, 209, 42], [-296.2, 210, 41], [-296.2, 211, 40], [-296.2, 212, 40], [-296.2, 213, 41], [-296.2 , 214, 42], [-296.2, 215, 43], [-296.2, 216, 44], [-296.2, 216, 45], [-296.2, 216, 46], [-296.2, 216, 47] , [-296.2, 216, 48], [-296.2, 215, 49], [-296.2, 214, 49], [-296.2, 213, 49], [-296.2, 212, 49], [-296.2, 211, 48], [-296.2, 210, 47], [-296.2, 209, 46], [-296.2, 209, 45], [-296.2, 209, 44], [-296.2, 209, 43], [-296.2, 209, 42]]) list([[-227.8, 151, 405], [-227.8, 152, 404], [-227.8, 153, 403], [-227.8, 154, 402], [-227.8, 155, 402], [- 227.8, 156, 402], [-227.8, 157, 403], [-227.8, 157, 404], [-227.8, 157, 405], [-227.8, 158, 406], [-227.8, 158, 407 ], [-227.8, 158, 408], [-227.8, 157, 409], [-227.8, 156, 409], [-227.8, 155, 409], [-227.8, 154, 408], [-227.8 , 153, 408], [-227.8, 152, 407], [-227.8, 151, 406], [-227.8, 151, 405], [-226.0, 152, 405], [-226.0, 153, 404] , [-226.0, 154, 404], [-226.0, 155, 403], [-226.0, 156, 404], [-226.0, 157, 405], [-226.0, 157, 406], [-226.0, 157, 407], [-226.0, 156, 408], [-226.0, 155, 408], [-226.0, 154, 408], [-226.0, 153, 408], [-226.0, 152, 407], [-226.0, 152, 406], [-226.0, 152, 405]]) list([[-226.0, 158, 407], [-226.0, 157, 408], [-226.0, 156, 409], [-226.0, 155, 409], [-226.0, 154, 409], [- 226.0, 153, 409], [-226.0, 152, 408], [-226.0, 151, 407], [-226.0, 152, 406], [-226.0, 153, 405], [-226.0, 153, 404 ], [-226.0, 154, 403], [-226.0, 155, 402], [-226.0, 156, 402], [-226.0, 157, 403], [-226.0, 158, 404], [-226.0 , 158, 405], [-226.0, 158, 406], [-226.0, 158, 407], [-227.8, 159, 407], [-227.8, 158, 408], [-227.8, 157, 409] , [-227.8, 156, 410], [-227.8, 155, 410], [-227.8, 154, 410], [-227.8, 153, 409], [-227.8, 152, 408], [-227.8, 151, 407], [-227.8, 151, 406], [-227.8, 151, 405], [-227.8, 152, 404], [-227.8, 153, 403], [-227.8, 154, 402], [-227.8, 155, 402], [-227.8, 156, 402], [-227.8, 157, 403], [-227.8, 158, 404], [-227.8, 158, 405], [-227.8, 158 , 406], [-227.8, 159, 407]]) list([[-296.2, 214, 46], [-296.2, 213, 47], [-296.2, 212, 47], [-296.2, 211, 47], [-296.2, 210, 46], [- 296.2, 209, 45], [-296.2, 208, 44], [-296.2, 208, 43], [-296.2, 208, 42], [-296.2, 209, 41], [-296.2, 210, 42 ], [-296.2, 211, 42], [-296.2, 212, 43], [-296.2, 213, 44], [-296.2, 214, 45], [-296.2, 214, 46], [-298.0 , 216, 47], [-298.0, 215, 48], [-298.0, 214, 49], [-298.0, 213, 49], [-298.0, 212, 49], [-298.0, 211, 49] , [-298.0, 210, 49], [-298.0, 209, 48], [-298.0, 208, 47], [-298.0, 207, 46], [-298.0, 207, 45], [-298.0, 207, 44], [-298.0, 208, 43], [-298.0, 208, 42], [-298.0, 209, 41], [-298.0, 210, 41], [-298.0, 211, 41], [-298.0, 212, 41], [-298.0, 213, 41], [-298.0, 214, 42], [-298.0, 215, 43], [-298.0, 216, 44], [-298.0, 216 , 45], [-298.0, 216, 46], [-298.0, 216, 47], [-299.8, 216, 50], [-299.8, 215, 51], [-299.8, 214, 51], [ -299.8, 213, 50], [-299.8, 212, 50], [-299.8, 211, 50], [-299.8, 210, 49], [-299.8, 209, 48], [-299.8, 208, 47], [-299.8, 207, 46], [-299.8, 207, 45], [-299.8, 207, 44], [-299.8, 208, 43], [-299.8, 209, 42], [- 299.8, 210, 42], [-299.8, 211, 41], [-299.8, 212, 41], [-299.8, 213, 42], [-299.8, 214, 42], [-299.8, 215, 43 ], [-299.8, 216, 44], [-299.8, 216, 45], [-299.8, 216, 46], [-299.8, 216, 47], [-299.8, 216, 48], [-299.8 , 216, 49], [-299.8, 216, 50]]) list([[-226.0, 158, 407], [-226.0, 157, 408], [-226.0, 156, 409], [-226.0, 155, 409], [-226.0, 154, 409], [- 226.0, 153, 409], [-226.0, 152, 409], [-226.0, 151, 409], [-226.0, 151, 408], [-226.0, 151, 407], [-226.0, 151, 406 ], [-226.0, 151, 405], [-226.0, 152, 404], [-226.0, 152, 403], [-226.0, 153, 403], [-226.0, 154, 402], [-226.0 , 154, 401], [-226.0, 155, 401], [-226.0, 156, 401], [-226.0, 157, 401], [-226.0, 157, 402], [-226.0, 158, 403] , [-226.0, 158, 404], [-226.0, 158, 405], [-226.0, 158, 406], [-226.0, 158, 407], [-227.8, 159, 407], [-227.8, 158, 408], [-227.8, 158, 409], [-227.8, 157, 409], [-227.8, 156, 410], [-227.8, 155, 410], [-227.8, 154, 409], [-227.8, 153, 409], [-227.8, 152, 409], [-227.8, 151, 408], [-227.8, 151, 407], [-227.8, 151, 406], [-227.8, 151 , 405], [-227.8, 151, 404], [-227.8, 152, 403], [-227.8, 152, 402], [-227.8, 153, 401], [-227.8, 154, 401], [ -227.8, 155, 401], [-227.8, 156, 401], [-227.8, 157, 401], [-227.8, 158, 402], [-227.8, 158, 403], [-227.8, 159, 404], [-227.8, 159, 405], [-227.8, 159, 406], [-227.8, 159, 407]]) list([[-296.2, 215, 47], [-296.2, 214, 48], [-296.2, 213, 48], [-296.2, 212, 48], [-296.2, 211, 48], [- 296.2, 210, 47], [-296.2, 209, 47], [-296.2, 208, 46], [-296.2, 208, 45], [-296.2, 207, 44], [-296.2, 207, 43 ], [-296.2, 208, 42], [-296.2, 209, 42], [-296.2, 210, 42], [-296.2, 211, 42], [-296.2, 212, 43], [-296.2 , 213, 43], [-296.2, 214, 44], [-296.2, 215, 45], [-296.2, 215, 46], [-296.2, 215, 47], [-298.0, 216, 47] , [-298.0, 215, 48], [-298.0, 214, 49], [-298.0, 214, 50], [-298.0, 213, 50], [-298.0, 212, 50], [-298.0, 211, 49], [-298.0, 210, 49], [-298.0, 209, 48], [-298.0, 208, 48], [-298.0, 207, 47], [-298.0, 207, 46], [-298.0, 207, 45], [-298.0, 207, 44], [-298.0, 207, 43], [-298.0, 207, 42], [-298.0, 207, 41], [-298.0, 208 , 41], [-298.0, 209, 41], [-298.0, 210, 41], [-298.0, 211, 41], [-298.0, 212, 41], [-298.0, 213, 41], [ -298.0, 214, 41], [-298.0, 215, 42], [-298.0, 215, 43], [-298.0, 216, 44], [-298.0, 216, 45], [-298.0, 216, 46], [-298.0, 216, 47], [-299.8, 217, 46], [-299.8, 216, 47], [-299.8, 216, 48], [-299.8, 215, 49], [- 299.8, 214, 50], [-299.8, 213, 50], [-299.8, 212, 50], [-299.8, 211, 50], [-299.8, 210, 50], [-299.8, 209, 49 ], [-299.8, 208, 48], [-299.8, 208, 47], [-299.8, 207, 46], [-299.8, 207, 45], [-299.8, 207, 44], [-299.8 , 208, 43], [-299.8, 209, 42], [-299.8, 209, 41], [-299.8, 210, 41], [-299.8, 211, 41], [-299.8, 212, 41] , [-299.8, 213, 41], [-299.8, 214, 42], [-299.8, 215, 42], [-299.8, 215, 43], [-299.8, 216, 44], [-299.8, 217, 45], [-299.8, 217, 46], [-301.6, 214, 45], [-301.6, 213, 46], [-301.6, 212, 47], [-301.6, 211, 47], [-301.6, 210, 46], [-301.6, 209, 45], [-301.6, 210, 44], [-301.6, 211, 43], [-301.6, 212, 43], [-301.6, 213 , 44], [-301.6, 214, 45]]) list([[-296.2, 209, 43], [-296.2, 209, 44], [-296.2, 210, 45], [-296.2, 211, 46], [-296.2, 212, 47], [- 296.2, 212, 48], [-296.2, 213, 48], [-296.2, 214, 48], [-296.2, 215, 47], [-296.2, 215, 46], [-296.2, 215, 45 ], [-296.2, 214, 44], [-296.2, 213, 43], [-296.2, 212, 43], [-296.2, 211, 43], [-296.2, 210, 43], [-296.2 , 209, 43], [-298.0, 208, 42], [-298.0, 208, 43], [-298.0, 208, 44], [-298.0, 208, 45], [-298.0, 208, 46] , [-298.0, 208, 47], [-298.0, 209, 47], [-298.0, 210, 48], [-298.0, 211, 48], [-298.0, 211, 49], [-298.0, 212, 49], [-298.0, 213, 48], [-298.0, 214, 48], [-298.0, 215, 47], [-298.0, 216, 46], [-298.0, 216, 45], [-298.0, 216, 44], [-298.0, 215, 43], [-298.0, 214, 43], [-298.0, 213, 42], [-298.0, 212, 42], [-298.0, 212 , 41], [-298.0, 211, 41], [-298.0, 210, 41], [-298.0, 209, 42], [-298.0, 208, 42], [-299.8, 210, 43], [ -299.8, 209, 43], [-299.8, 208, 44], [-299.8, 207, 44], [-299.8, 207, 45], [-299.8, 207, 46], [-299.8, 208, 47], [-299.8, 209, 48], [-299.8, 210, 49], [-299.8, 211, 49], [-299.8, 212, 49], [-299.8, 213, 50], [- 299.8, 214, 49], [-299.8, 215, 48], [-299.8, 215, 47], [-299.8, 216, 46], [-299.8, 216, 45], [-299.8, 215, 44 ], [-299.8, 215, 43], [-299.8, 214, 43], [-299.8, 214, 42], [-299.8, 213, 42], [-299.8, 212, 41], [-299.8 , 211, 41], [-299.8, 210, 42], [-299.8, 210, 43]])] <class 'numpy.ndarray'>
2. Mark times and mask array generation
Generating npy
files is not the final result of this annotation information. There are several reasons:
- The nodule coordinates marked in the
xml
file are marked separately by multiple doctors, so there will be overlap in marking (that is, a nodule is marked repeatedly by multiple doctors, many of which are back-to-back, and there is no know what other doctors have labeled). Therefore, it is necessary to process the content marked by multiple people and leave the final nodule coordinates; - It is just a coordinate point, but you also need to generate a
mask
file that has the sameshape
as theimage
and corresponds to each other.
Based on the above reasons, generating the final mask
file requires the following steps:
- The marked nodule coordinate points need to be processed by
hu z
toinstanceNum
on the corresponding image; - Process the nodules marked by multiple doctors and leave the final nodule according to the
iou
overlap rule; - The remaining nodule coordinates are drawn on
mask
and stored.
The implementation code is as follows:
import nrrd import SimpleITK as sitk import cv2 import os import numpy as np def load_itk_image(filename): """ Return img array and [z,y,x]-ordered origin and spacing """ # The shape of the image returned by sitk.ReadImage is x, y, z itkimage = sitk.ReadImage(filename) numpyImage = sitk.GetArrayFromImage(itkimage) numpyOrigin = np.array(list(reversed(itkimage.GetOrigin()))) numpySpacing = np.array(list(reversed(itkimage.GetSpacing()))) return numpyImage, numpyOrigin, numpySpacing def arrs2mask(img_dir, ctr_arr_dir, save_dir): cnt = 0 consensus = {<!-- -->1: 0, 2: 0, 3: 0, 4: 0} # Consensus # generate save document for k in consensus.keys(): if not os.path.exists(os.path.join(save_dir, str(k))): os.makedirs(os.path.join(save_dir, str(k))) for f in os.listdir(img_dir): if f.endswith('.mhd'): pid = f[:-4] print('pid:', pid) #ct img, origin, spacing = load_itk_image(os.path.join(img_dir, '%s.mhd' % (pid))) # mask coor npy ctr_arrs = np.load(os.path.join(ctr_arr_dir, '%s.npy' % (pid)), allow_pickle=True) cnt + = len(ctr_arrs) nodule_masks = [] # Label the nodules in sequence for ctr_arr in ctr_arrs: z_origin = origin[0] z_spacing = spacing[0] ctr_arr = np.array(ctr_arr) # ctr_arr[:, 0] z-axis direction value, from hu z to instanceNum [-50, -40, -30]-->[2, 3, 4] ctr_arr[:, 0] = np.absolute(ctr_arr[:, 0] - z_origin) / z_spacing # Find the absolute value of each element in the array. np.abs is the abbreviation of this function ctr_arr = ctr_arr.astype(np.int32) print(ctr_arr) # For each marked nodule, a mask file with the same size as img will be temporarily generated. mask = np.zeros(img.shape) # Traverse the z-axis sequence of the annotation layer for z in np.unique(ctr_arr[:, 0]): # Remove duplicate elements and sort them by elements from small to large ctr = ctr_arr[ctr_arr[:, 0] == z][:, [2, 1]] ctr = np.array([ctr], dtype=np.int32) mask[z] = cv2.fillPoly(mask[z], ctr, color=(1,)) nodule_masks.append(mask) i = 0 visited = [] d = {<!-- -->} masks = [] while i < len(nodule_masks): # If mached before, then no need to create new mask if i in visited: i+=1 continue same_nodules = [] mask1 = nodule_masks[i] same_nodules.append(mask1) d[i] = {<!-- -->} d[i]['count'] = 1 d[i]['iou'] = [] # Find annotations pointing to the same nodule # The current node mask[i], and all the nodes behind it, find iou in turn for j in range(i + 1, len(nodule_masks)): # if not overlapped with previous added nodules if j in visited: continue mask2 = nodule_masks[j] iou = float(np.logical_and(mask1, mask2).sum()) / np.logical_or(mask1, mask2).sum() # If iou exceeds the threshold, the current i-th mask is recorded as being marked repeatedly. if iou > 0.4: visited.append(j) same_nodules.append(mask2) d[i]['count'] + = 1 d[i]['iou'].append(iou) masks.append(same_nodules) i+=1 print(visited) exit() # only 4 people, check up 4 data for k, v in d.items(): if v['count'] > 4: print('WARNING: %s: %dth nodule, iou: %s' % (pid, k, str(v['iou']))) v['count'] = 4 consensus[v['count']] + = 1 # number of consensus num = np.array([len(m) for m in masks]) num[num > 4] = 4 # Up to 4 times. If the mark is repeated more than 4 times, it will be counted as 4 times. if len(num) == 0: continue # Iterate from the nodules with most consensus for n in range(num.max(), 0, -1): mask = np.zeros(img.shape, dtype=np.uint8) for i, index in enumerate(np.where(num >= n)[0]): same_nodules = masks[index] m = np.logical_or.reduce(same_nodules) mask[m] = i + 1 # Distinguish different nodules, and give different values to different nodules, which increase in sequence (if it is segmented, you can directly give them all 1, or they can be unified to 1 in the end) nrrd.write(os.path.join(save_dir, str(n), pid + '.nrrd'), mask) # mask print(consensus) print(cnt) if __name__ == '__main__': img_dir = r'./LUNA16/image_combined' # data ctr_arr_save_dir = r'./LUNA16/annotation/noduleCoor' # Where to save the intermediate nodule mask parsed by each annotator noduleMask_save_dir = r'./LUNA16/nodule_masks' # Folder to save merged nodule masks # Generate a mask for the dumped temporary file arrs2mask(img_dir, ctr_arr_save_dir, noduleMask_save_dir)
At this point, the mask
of the shape
that is the same as the image
is generated. Next, use itk-snap
to open and view the processed results, as shown below:
Belongs to nrrd
images that open image
and mask
respectively, image
in mhd
format, To convert nrrd
, you can refer to the following code:
nii_path = os.path.join(r'./LUNA16/image_combined', '1.3.6.1.4.1.14519.5.2.1.6279.6001.184412674007117333405073397832.mhd') image = itk.array_from_image(itk.imread(nii_path)) nrrd.write(r'./image.nrrd', image)
3. Summary
The data formats in the lidc-idri
data set are data formats that we don’t often encounter, especially the raw
files of mhd
files. , representing two different parts of a data at the same time, is also rarely encountered.
But for beginners, understanding this data form is still a bit unfamiliar. I believe this part can be understood through this series. At the same time, this article is also stored as a nrrd
file. This is my preferred array storage format. It is easy to understand and simple to understand.
At this point, you have gained a new one-to-one correspondence. This will be much easier to understand than looking at the xml
file. In the next section, we will combine the initially obtained image
and mask
with the lung area segmentation for further refinement. The resample
operation adjusts the data to a unified scale.