ENVI IDL: How to parse XML files (taking the Landsat9-MTL.xml file as an example)

01 Foreword

We originally planned to perform radiometric calibration on Landsat9 files, but the parameters of radiometric calibration are in the MTL file. It is feasible to view the parameters from the file and directly copy them to IDL, but when we perform batch radiometric calibration on Landsat9 files, this This method will be ineffective. Therefore, we need to automatically read the relevant parameters from the MTL file. The relevant parameters here actually only include two parameters (for one band), one is the proportion coefficient and the other is the offset.

For Landsat9, three MTL forms are given:

Here we only discuss the parsing and extraction of txt text files and XML files.

02 Obtain calibration parameters through XML file

You need to use the IDL IDLffXMLDOMDocument class and the class methods getelementsbytagname, getfirstchild, GetNodeValue.

The getelementsbytagname method obtains all tags that meet the requirements by specifying the tag name (returning in a similar list form: IDLffXMLDOMNodeList);
getfirstchildGets the first child node of the node;
GetNodeValueGet the value of the node;

Since the value returned by the getelementsbytagname method is in a list-like form, when the tag name we specify is unique in the XML file, then there is actually only one element in the list element, which needs to be passed through . item(0)Retrieve the first element (it is still an object).

Since our scaling parameters are similar to the following:

But please note that the same node name is also found in another label:

There are two radiation calibration parameters above. The first is Level 2 radiation calibration, and the final result is surface reflectance or surface temperature (we use this); and the second is for Level 1 (L1). Radiometric calibration converts the raw digital data captured by the sensor into radiance values. Therefore, we need to perform the getelementsbytagname method twice. The first time is to obtain the node LEVEL2_SURFACE_REFLECTANCE_PARAMETERS, and then use this method to retrieve each child node that meets the requirements from the node ( Scale coefficient and offset node for each band).

Then get all values from the specified child nodes obtained.

So our code should be written like this:

pro L9_C2_calibration
    ; Prepare
    xml_path = 'D:\Objects\JuniorFallTerm\IDLProgram\Experiments\ExperimentalData\Week8\LC09_L2SP_130039_20220311_20220314_02_T1_MTL.xml'
    xml = IDLffXMLDOMDocument(filename=xml_path)
    
    ; Get level2
    level2 = xml.getelementsbytagname('LEVEL2_SURFACE_REFLECTANCE_PARAMETERS')
    level2 = level2.item(0)
    b1 = level2.getelementsbytagname('REFLECTANCE_MULT_BAND_1')
    b1 = b1.item(0)
    print, double((b1.getfirstchild()).getnodevalue())
    
    ; Destroy object
    obj_destroy, b1
    obj_destroy, level2
    obj_destroy, xml
end

Output result:

(PS: To be honest, IDL’s XML objects are really hard to use. They are too low-level and not as good as python. But the advantage is that you can write some advanced functions more freely to encapsulate the methods you want)

Encapsulated, the function is as follows:

; +
; Function usage:
; Used to get the value of the specified path node
; Function parameters:
; xml_path: path to xml file
; tags_name: the name of each node (array form), arranged in parent-child order
;-
function xml_get_value, xml_path, tags_name
    xml = idlffxmldomdocument(filename=xml_path); Instantiate an XML object
    
    cur_tag = xml
    foreach tag_name, tags_name do begin
        cur_tag = cur_tag.getelementsbytagname(tag_name)
        cur_tag = cur_tag.item(0)
    endforeach
    
    return, (cur_tag.getfirstchild()).getnodevalue()
end

If your node relative path is as follows:

LEVEL2_SURFACE_REFLECTANCE_PARAMETERS\REFLECTANCE_MAXIMUM_BAND_1
Right now:

Then get the value as follows:

a = xml_get_value(xml_path, ['LEVEL2_SURFACE_REFLECTANCE_PARAMETERS', 'REFLECTANCE_MULT_BAND_1'])
print, a

However, it should be noted that I have not set up any error mechanism. If your path is wrong or incorrect, the return value will be NULL or even an error will be reported directly. Also, please note that I assume that all nodes are unique among their parent nodes, that is, It does not consider that there are multiple child nodes with the same name under the parent node. Also make sure your relative path is unique. If you only pass in [REFLECTANCE_MAXIMUM_BAND_1] instead of the above form, then as we know from the previous article, if the node name exists under multiple tags, then the function will automatically take the first matching values.

03 Obtain calibration parameters through text files

This is to obtain the value through string interception and other methods. Here we are tossing around with various string operation functions. The general idea is still the same as before. The code is given here:

 ; prepare
    txt_path = 'D:\Objects\JuniorFallTerm\IDLProgram\Experiments\ExperimentalData\Week8\LC09_L2SP_130039_20220311_20220314_02_T1_MTL.txt'
    
    openr, 1, txt_path
    txt_content = strarr(file_lines(txt_path))
    readf, 1, txt_content
    level2_pos = where(strmatch(txt_content, '*LEVEL2_SURFACE_REFLECTANCE_PARAMETERS*'))
    calibration_content = txt_content[level2_pos[0]:level2_pos[1]]
    band_sc_pos = where(strmatch(calibration_content, '*REFLECTANCE_MULT_BAND_1*'))
    band_sc = (strsplit(calibration_content[band_sc_pos], '=', /extract))[-1]
    print, band_sc
    free_lun, 1

The running results are as follows:

Insert image description here

Note that the results obtained by the above two methods are both strings and need to be converted into numerical types such as double.

Of course, there are other methods, such as calling the python module (XML built-in module) in IDL, provided you install a python interpreter. The code is also posted here:

ET = python.import('xml.etree.ElementTree')
tree = ET.parse(xml_path)
root = tree.getroot()
finds = root.find('./LEVEL2_SURFACE_REFLECTANCE_PARAMETERS/REFLECTANCE_MULT_BAND_1')
print, finds.text

Output result:

Finally, post a complete code for radiation calibration of each band of Landsat9 (take the calibration parameters and use method 1):

; @Author : ChaoQiezi
; @Time : November 11, 2023 - 10:24:06 am
; @Email: [email protected]

; This program is used to perform radiation calibration on the first-level product of Landsat9 C2 (second version algorithm) and output it as a TIFF file

; +
; Function usage:
; Used to get the value of the specified path node
; Function parameters:
; xml_path: path to xml file
; tags_name: the name of each node (array form), arranged in parent-child order
;-
function xml_get_value, xml_path, tags_name, double=double
    xml = idlffxmldomdocument(filename=xml_path); Instantiate an XML object
    
    cur_tag = xml
    foreach tag_name, tags_name do begin
        cur_tag = cur_tag.getelementsbytagname(tag_name)
        cur_tag = cur_tag.item(0)
    endforeach
    
    value = (cur_tag.getfirstchild()).getnodevalue()
    if keyword_set(double) then return, double(value)
    
    return, value
end

pro L9_C2_calibration
    ; Prepare
    in_dir = 'D:\Objects\JuniorFallTerm\IDLProgram\Experiments\ExperimentalData\Week8'
    out_dir = in_dir + 'out_me'
    if ~file_test(out_dir, /directory) then file_mkdir, out_dir
    xml_path = in_dir + 'LC09_L2SP_130039_20220311_20220314_02_T1_MTL.xml'
    level2_name = 'LEVEL2_SURFACE_REFLECTANCE_PARAMETERS'
    mult_name = 'REFLECTANCE_MULT_BAND_'
    add_name = 'REFLECTANCE_ADD_BAND_'
    img_wildcard = '*T1_SR_B'
    
    for band_ix = 1, 7 do begin
        cur_mult_name = mult_name + strtrim(band_ix, 1)
        cur_add_name = add_name + strtrim(band_ix, 1)
        cur_img_name = img_wildcard + strtrim(band_ix, 1) + '.tif'
        scale = xml_get_value(xml_path, [level2_name, cur_mult_name], /double)
        add = xml_get_value(xml_path, [level2_name, cur_add_name], /double)
        
        ; Read image files and calibrate
        cur_img_path = (file_search(in_dir + cur_img_name))[0]
        cur_img = double(read_tiff(cur_img_path, geotiff=geo_info, dot_range=range))
        cur_img[where(cur_img eq 0.0, /null)] = !values.F_NAN
        cur_img = cur_img * scale + add
        
        ; Output
        cur_out_path = out_dir + file_basename(cur_img_path)
        write_tiff, cur_out_path, cur_img, geotiff=geo_info, /double
    endfor
end