PostgreSQL complex nested json data correction case

Directory

  • Reference link
  • Problem Description
    • The first problem encountered: How to determine whether there are elements that meet the requirements in the nested json object array in the json field of each record
    • The second problem encountered: the function is written while debugging, and it is necessary to output logs to observe the debugging results.
    • The third problem encountered: updating some (multiple) elements of the json object array
    • Encountered the fourth problem: the temporary function should be deleted in time after execution
  • Summarize

Reference link

  1. PostgreSql 11 version json operation function documentation

Problem description

Due to the impact of new requirements, our system needs to perform batch corrections on a json type field of historical data. We need to write a sql script to perform the correction.

Let’s take a look at the data structure of that json field (omitting irrelevant parts):

{<!-- -->
    "config": [
        {<!-- -->
            "inputStyle": "INTEGER",
            "strVal": "59",
            "name": "Failed"
        },
        {<!-- -->
            "inputStyle": "INTEGER",
            "strVal": "60",
            "name": "pass"
        },
        {<!-- -->
            "inputStyle": "INTEGER",
            "strVal": "70",
            "name": "good"
        },
        {<!-- -->
            "inputStyle": "INTEGER",
            "strVal": "90",
            "name": "Excellent"
        }
    ]
}

The purpose of our script is to convert the input type to numeric type (inputStyle is INTEGER or DECIMAL) and strVal is not empty and intValue is not equal to strVal. Make a data transfer: copy and paste the value of strVal to intValue

The first problem encountered: How to determine whether there are elements that meet the requirements in the nested json object array in the json field of each record

The postgresql version used by our system is 11. On the premise that new functions cannot be used, we can only customize functions to call.

-- Query all indicators in the record that are of numeric type and have strVal value but intValue is empty
-- Parameter description: jsonarr is an array of json objects
create or replace function jsonb_path_match_20231013(jsonarr jsonb)
    -- The function returns true when it determines that there is any element that meets the conditions in the json object array, otherwise it returns false
    returns boolean as
$$
declare
    total integer := 0;
    declare ar RECORD;
    declare is_contain boolean := false;
begin
    -- Parameter type verification
    if 'array' <> jsonb_typeof(jsonarr) then
        RAISE exception 'The parameter is not an array type and cannot be processed';
    end if;
    -- Parameter content verification: empty array is judged as false
    total := jsonb_array_length(jsonarr);
    if total < 1 then
        return false;
    end if;
    
    -- Traverse the array
    for ar in select jsonb_array_elements(jsonarr) as ele
        loop
            -- If the array element appears to meet the condition (true) during the traversal process, break out of the loop
            exit when is_contain;
            -- Determine whether the array element whose input type is numeric and strVal is not empty and intValue is not equal to strVal exists. If it exists, assign is_contain to true.
                -- If the input type is numeric
            if ('DECIMAL' = (ar.ele ->> 'inputStyle') or 'INTEGER' = (ar.ele ->> 'inputStyle'))
                -- and strVal is not empty
            and ar.ele -> 'strVal' is not null
            and length(trim(ar.ele ->> 'strVal')) > 0
                -- and intValue is not equal to strVal
            and (ar.ele -> 'intValue' is null or
                    (ar.ele ->> 'intValue')::decimal <> (trim(ar.ele ->> 'strVal'))::decimal)
            then
                is_contain := true;
            end if;
        end loop;
    return is_contain;
end
$$
    language 'plpgsql';

The second problem encountered: the function is debugged and written at the same time, and logs need to be output to observe the debugging results

Use the following syntax to output logs, and view the output logs on the console:

RAISE NOTICE 'The value of a variable: %' val;

The third problem encountered: updating some (multiple) elements of the json object array

Under normal circumstances, to update the json structure field, you need to use the jsonb_set function, but you need a clear json path to be updated, but it does not apply to updating some elements in an array.

Implementation idea: Execute update once for each array element.

Implemented using stored procedures

DO
$$
    -- rec is the query result record that meets the conditions
    declare
        rec RECORD;
    BEGIN
        -- Query the records whose input type is numeric type and has strVal value but intValue is not equal to strVal among the elements of nested arrays in all json fields
        FOR rec IN select
                       ID,
                       config,
                       json_array_length(config -> 'config') as length
                   from
                       test_json_array_table
                   where
                         config is not null
                     and config -> 'config' is not null
                     and config #> '{config,0}' is not null
                     -- Call the function written above to determine the json object array
                     and jsonb_path_match_20231013((config -> 'config')::jsonb)
            LOOP
                -- Traverse the "config.config[*]" json array in each record, i is the array index, starting from 0
                for i in 0..(rec.length - 1)
                    loop
                                -- If the input type is numeric
                        if ('DECIMAL' = (rec.config::jsonb -> 'config' -> i ->> 'inputStyle')
                            OR 'INTEGER' = (rec.config::jsonb -> 'config' -> i ->> 'inputStyle'))
                            -- and strVal is not empty
                        and rec.config::jsonb -> 'config' -> i -> 'strVal' is not null
                        and length(trim(rec.config::jsonb -> 'config' -> i ->> 'strVal')) > 0
                            -- and intValue is not equal to strVal
                        and (rec.config::jsonb -> 'config' -> i -> 'intValue' is null
                            or (rec.config::jsonb -> 'config' -> i ->> 'intValue')::decimal
                                <> (trim(rec.config::jsonb -> 'config' -> i ->> 'strVal'))::decimal)
                        then
                            --Update each intValue in each record in batches
                            update test_json_array_table
                            set config = jsonb_set(config::jsonb, array ['config', i::text, 'intValue'],
                                                (config::jsonb -> 'config' -> i ->> 'strVal')::decimal::varchar::jsonb,
                                                true)
                            where
                                id = rec.id;
                        end if;
                    end loop;
            end loop;
    END
$$;

Encountered the fourth problem: temporary functions should be deleted in time after execution

--Delete-can only be executed once
DROP FUNCTION if exists jsonb_path_match_20231013;

Summary

Key knowledge:

  1. Output console log: raise notice 'The value of a certain variable: %',val;
  2. Check the json field data type: jsonb_typeof()
  3. Check the json array length: json_array_length()
  4. Get JSON object: ->
  5. Get the value of the JSON object: ->>
  6. Get the JSON object at the specified path: #>
  7. Expand the json array into a collection of objects to facilitate traversal: jsonb_array_elements()
  8. Interrupt loop: exit when ;
  9. Clear the leading and trailing spaces of a string: trim()
  10. Variable assignment: :=
  11. Type conversion: ::
  12. json object setting value: jsonb_set()