To do storage business, I parsed the source code of a project

Recently, I am doing storage-related business, more specifically, storage-related research and development, so I checked the relevant information on the Internet, and after careful consideration, I decided to start researching from the simplest Json data exchange format.

JSON is a data exchange format independent of programming languages. Almost all languages related to network development have JSON function libraries, for example: cJSON in C language, package json in Golang language, jackson in Java language, and Python language in pyson et al.

1. Introduction to JSON

2. cJSON source code analysis

3. How to design the “key-value” data structure

4. How to design JSON node operation functions

5. How to design functions for parsing JSON strings

6. How to design a JSON node to convert a JSON string

1. Introduction to JSON

There may be some students who are still studying and don’t know what JSON is and what is the agreement. Here is a general introduction. If you already know it, you can skip this part directly.

To develop a JSON codec, of course you need to understand what is JSON

The JSON basic data format adopts “key-value pairs”, that is, “key: value” form, where key can only be a string, and there are many data types of value; commas are used between multiple “key-value pairs” , separated, the key and the value are separated by a colon :

The value basic data type of JSON:

  1. Value (number): decimal number, cannot have leading 0, can be negative, and can have a decimal part. Does not distinguish between integers and floating point numbers

  2. String (string): zero or more Unicode code points enclosed in double quotes "". Escape character sequences starting with a backslash are supported

  3. Object (object): composed of unordered “key-value pairs”, wrapped in curly braces {}

  4. Array (array): ordered zero or more values, each value can be of any type, wrapped in square brackets, such as: [value, value]

  5. true: boolean true

  6. false: boolean false

  7. null: empty value

Four specific characters are considered whitespace: space (‘ ‘), horizontal tab (‘\t’), carriage return (‘\r’), newline (\ ‘\\
‘)

In the JSON string, meaningless whitespace is allowed before and after the element (will be ignored)

Example JSON string

{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 27,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "children": [
    "Catherine",
    "Thomas",
    "Trevor"
  ],
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "spouse": null
}

2. cJSON source code analysis

Here I choose cJSON for further understanding and analysis, because C language is the mother of programming languages and my first language.

013b9037753f420e64e493f899046da2.png

cJSON repository

cJSON is a warehouse with 8.8k stars on github. It is a project that many C/C++ practitioners will start to choose when they start to learn source code knowledge.

This article will give an introduction to the source code of cJSON. Due to the space, I can’t write too much. Here I only write some macro introductions in the source code parsing process. Even so, this article is almost 5900 words long.

3f96fe5e3e3ef41ecf944af53f30e111.png

So for specific implementation details, please refer to the source code repository with detailed notes: https://github.com/MingWangSong/MycJSON

After we understand the conventions of JSON data, we can analyze how cJSON is implemented while raising design requirements.

The JSON codec toolkit mainly provides the two major functions of JSON serialization and deserialization.

Among them, serialization refers to converting programming language structured data into JSON string; deserialization refers to the opposite operation.

To put it more vividly, JSON serialization is to output or print JSON strings, and JSON deserialization is to parse JSON strings.

97b418738d63bd3797c9a1c177b96592.png

JSON serialization and deserialization

JSON data as a whole is an object type of data, which can be imagined as the default root node of a key; a “key-value” is regarded as the basic structure in JSON, and “key-value” is everywhere in the entire JSON (this sentence Linux-flavored).

Therefore, in order to implement JSON serialization and deserialization, it is necessary to define a “key-value” data structure, which is called a JSON node.

3. How to design the “key-value” data structure

  1. The design of the “key-value” data structure needs to consider two things: Single “key-value” data representation and Relationship representation between multiple “key-value”

  2. For the relationship representation between multiple “key-values”, cJSON’s design idea is to connect JSON nodes in series through a linked list. Specifically, use single linked list to realize nested JSON relationships, and use Doubly linked list implements JSON peer relationship.

    Therefore, all subsequent JSON node operations are linked list related. The following is an example drawing description:

    {
      "firstName": "John",
      "lastName": "Smith",
      "children": [
        "Catherine",
        "Thomas",
        "Trevor"
      ],
      "address": {
        "streetAddress": "2nd-Street",
        "city": "New York"
      }
    }

    The structural diagram of cJSON processing the above JSON data is as follows:

    755e9435f2ea3817a42495a6b4cfd867.png

    In the first column of the figure above, the yellow block is the JSON root node. The key of this node is default, and the value corresponds to several JSON nodes wrapped by the outermost {} of the JSON string. These nodes are the root nodes. Child node, the red arrow in the figure represents the pointer to the child node;

    The second column represents the first-level JSON child nodes under the root node. Nodes at the same level form a double-linked list. The black pointers in the figure are the predecessor and successor pointers of the double-linked list.

    It is worth noting that when cJSON processes array objects, it treats each element in the array as a JSON node for storage, and also stores it in the form of a double-linked list. The value of an array-type JSON node has no value Instead, the array address is stored through pointers to child nodes, so many subsequent operations of object type nodes and array type nodes can be reused, which is very wonderful!

  3. The cJSON node data structure is designed as follows:

    typedef struct cJSON {
        struct cJSON *next; // the successor pointer of the same level node
        struct cJSON *prev; // predecessor pointer of peer node
        struct cJSON *child; // child node pointer
        int type; // The type of "value", there are 9 types in total
        char *valuestring; // "value" type is a storage variable corresponding to a string
        int valueint; // "value" type is a storage variable corresponding to an integer (writing valueint is deprecated)
        double valuedouble; // "value" type is a storage variable corresponding to a floating point number
        char *string; // "key" value in the key-value pair
    } cJSON;

    Among them, next, prev, child represent the predecessor node, successor node and child node respectively, and are used to handle the hierarchical relationship between JSON data nodes; type represents the value type of the JSON node. In particular, the variable string represents the key (this naming makes me very uncomfortable).

    Because the type definition distinguishes two types of Boolean values, true and false, the JSON node of the Boolean type does not actually need other variables to store the value. The specific types are defined as follows:

    #define cJSON_Invalid (0) // invalid type
    #define cJSON_False (1 << 0)
    #define cJSON_True (1 << 1)
    #define cJSON_NULL (1 << 2)
    #define cJSON_Number (1 << 3)
    #define cJSON_String (1 << 4)
    #define cJSON_Array (1 << 5)
    #define cJSON_Object (1 << 6)
    #define cJSON_Raw (1 << 7) // raw json
    #define cJSON_IsReference (1 << 8)
    #define cJSON_StringIsConst (1 << 9)

4. How to design JSON node operation functions

Whether it is JSON serialization or deserialization, operations on JSON nodes cannot be avoided, so first understand the design of JSON node operations in cJSON

  1. Create JSON nodes

    4ca9d10430e2b8e10b2a72284c7f27ec.png

    The figure above is the call flow chart for creating JSON nodes in cJSON. According to the implementation of cJSON source code, functions for creating JSON nodes can be roughly divided into three categories: creating non-array type JSON nodes, creating array type JSON nodes, and creating JSON node references. Because similar functions implement similar logic.

    There are three steps to create a JSON node: open up space, set the node type, and set the node value.

    Note that does not set the key value when creating a node, so when adding a node, it does not call this type of function directly, but provides the corresponding Add function, which will be introduced in detail later.

  • Create a non-array type JSON node

    When creating a non-array type node, first call cJSON_New_Item() to apply for space for the cJSON node and initialize it with memset, and then set the node type and value. If you want to add the specified node to the specified object or array, you need to set the next or child pointer, the source code is as follows:

    cJSON * cJSON_CreateObjectReference(const cJSON *child){
        cJSON *item = cJSON_New_Item( & global_hooks);
        if (item != NULL) {
            item->type = cJSON_Object | cJSON_IsReference;
            item->child = (cJSON*)cast_away_const(child);
        }
        return item;
    }
    cJSON * cJSON_CreateArrayReference(const cJSON *child) {
        cJSON *item = cJSON_New_Item( & global_hooks);
        if (item != NULL) {
            item->type = cJSON_Array | cJSON_IsReference;
            item->child = (cJSON*)cast_away_const(child);
        }
        return item;
    }
  • Create an array type JSON node

    When creating an array type node, first call cJSON_New_Item() to apply for space for the cJSON node and initialize it with memset, and then set the node type. Then, you need to build a child node list based on the array array. The child node list is bidirectional Linked list, and the head node predecessor pointer points to the tail node, so that the tail node can be quickly located when the tail is inserted.

Add JSON node

93728fc430e6624660ceb13fc63e7f83.png

Since cJSON uses a linked list structure to store nodes, adding a JSON node involves an insertion operation in the linked list. Because in cJSON, the array elements in value are also stored by JSON nodes, so adding elements to the array is also the same linked list operation.

As can be seen from the figure above, the core function of adding a JSON node is add_item_to_array(), this method inserts the newly created JSON node in the form of tail insertion, in order to quickly find the tail node, each time you insert The predecessor pointer of the head node will be updated to point to the tail node. The specific insertion process is shown in the figure below.

cJSON also provides cJSON_InsertItemInArray(), which implements inserting elements at any position in the array.

01ba8db6f1a7587463d225cf18cc3de6.png

Replace JSON node

9b4b8736d309d0ae733cc7cbeae06eaa.png

The core function for replacing JSON nodes is cJSON_ReplaceItemViaPointer, which is mainly for replacing elements in double-linked lists.

delete JSON node

89570931f49215011e9e461b55f8a463.png

The core functions for deleting JSON nodes are cJSON_Delete and cJSON_DetachItemViaPointer, which recursively delete the specified node and all its child nodes, but the nodes of type cJSON_IsReference will not be deleted.

Among them, cJSON_DetachItemViaPointer mainly deals with the relevant pointers of the nodes to be deleted; cJSON_Delete is mainly used to release the storage space of the deleted nodes.

Find JSON nodes

ac8cc87a07c0e1370f2a2a84cc52bb08.png

The core function for finding JSON nodes is get_object_item, which can only be searched through because it is a linked list storage structure.

5. How to design parsing JSON string functions

4bec8bc81a55cb57262c31232ff3b514.png

When using cJSON to parse JSON nodes, call the cJSON_Parse() function directly. The above figure shows the calling relationship of this function. The core functions are parse_value and parse_object >. Among them, parse_value implements the parsing of different types of value values. Boolean types and null types are directly processed in this function. Numbers, characters Strings, arrays, and object types are processed separately through encapsulation functions; parse_object mainly implements parsing for JSON nodes, by first parsing key values with parse_string , and then parse the value value through parse_value.

Especially, because the data structure design of nested objects and arrays in cJSON is the same, so the implementation of parse_object and parse_array is almost the same, the difference The main reason is that parse_array has no key value to be parsed.

6. How to design JSON node to JSON string function

32592c519dadaffbea46aee8fcbca48b.png

Comparing the function call graph of parsing the JSON string function, it can be found that the design logic of the JSON node to JSON string function is similar, and it also includes two core functions print_value and print_object, here No further analysis will be done.

I hope that today’s source code analysis can be helpful to everyone, and more source code-level project analysis will be brought in the future.

Reference link

  • DaveGamble/cJSON: Ultralightweight JSON parser in ANSI C (github.com): https://github.com/DaveGamble/cJSON

  • JSON – Wikipedia, the Free Encyclopedia (wikipedia.org): https://en.wikipedia.org/en-us/JSON

  • JSON: https://www.json.org/json-zh.html

  • Detailed explanation of cJSON source code and analysis process

  • How to parse JSON data and how to use memory hooks-Tencent Cloud Developer Community-Tencent Cloud (tencent.com):https://cloud.tencent.com/developer/beta/article/1662821

  • cJSON replaces the default malloc function – chilkings – Blog Park (cnblogs.com):https://www.cnblogs.com/chilkings/p/15821140.html

  • The use of cjson library and source code reading

  • Analysis of .h and .c files in C language (very exciting)-Tencent Cloud Developer Community-Tencent Cloud (tencent.com):https://cloud.tencent.com/developer/article/1690858

  • MingWangSong/MycJSON (github.com): https://github.com/MingWangSong/MycJSON

↓↓Recommended↓↓↓