[Network] Custom protocol | Serialization and deserialization | Take tcpServer as an example

This article was first published in Mu Xue’s humble house

Take the calculator service of tcpServer as an example to implement a custom protocol

Before reading this article, please read tcpServer

For the complete code of this article, see Gitee

1. Re-discuss tcp

Note that the current description of tcp is for the sake of simplicity and easy understanding, and the tcp protocol will be further interpreted later

1.1 Links

We know that tcp is connection-oriented, and the client and server must first establish a link before they can start communicating

  • During the connection process, tcp uses a three-way handshake
  • During the disconnection process, tcp uses four waves

Give a chestnut in daily life to help understand the 3 handshakes and 4 waved hands

image-20230211103933999

1.2 Sending information

What if we now need to send structured data?

We know that tcp is byte-oriented, that is, it can send arbitrary data. It can also send binary data of C language structure;

  • But being able to send it means we can do it?
  • The answer is naturally No!

Different platforms have different configurations for structure alignment and different endianness, so the final analysis of our byte stream is also different. If the method of directly sending structure data is used to communicate, the adaptability is extremely low, and our client and server will be limited to run in the current system environment;

However, even if it is the same system, its internal configuration of big and small endian may change! At that time, our code may not be able to run!

In the same way, when writing the code of the C language address book, the method of directly writing the structure data to the file cannot be used. Subsequent code upgrades and environment changes, may cause the data in our stored files to become invalid, which is definitely something we don’t want to see.

Therefore, in order to solve this problem, we should serialize the data before sending it. After the client receives the information, it will deserialize and parse the data!

2. Serialization and deserialization

2.1 Introduction

The so-called serialization is to convert structured data (which can be temporarily understood as the structure of c) into strings and send them out

struct date
{<!-- -->
    int year;
    int month;
    int day;
};

For example, the date structure above, if we want to serialize it, we can splice it into a string in a very simple way (serialization)

year-month-day

After the client receives this string, it can find the separator -, take out three variables, convert them into int and store them back into the structure (deserialization)

In this way, even if we stipulate a serialization and deserialization method, it is a simple protocol!

2.2 Codec

There will be another problem here, how do I know that I have read a serialized data?

2000-12-10
10000-01-01

As above, suppose that one day, our year becomes a five-digit number; at this time, how does the server know whether it has read a complete serialized data?

This requires us to make a good rule, use the first n bytes as the data of the identification length. After receiving the data, first take out the first n bytes, read the length m of this message, and then read the data of m bytes later, and successfully take out the complete string;

  • This process can be called the process of encoding and decoding

In order to distinguish the data of the identified length from the actual serialized content, we can add the separator \t; but this also requires us to confirm that the transmitted data itself cannot carry \t, otherwise there will be a series of problems

10\t2000-12-10\t
11\t10000-01-01\t

The above series of tasks are all part of protocol customization! We have specified a serialization and deserialization method for the server and client, so that the communication between the two can avoid the restrictions of the platform. After all, the data decoded by any platform to string will be the same!

Let’s use a calculator service to demonstrate it

3. Calculator service

Because the focus of this article is the demonstration of protocol customization, the calculator here does not consider the case of continuous operators,

3.1 Protocol customization

To implement a calculator, we must first understand how many members the calculator has

x + y
x/y
x*y
...

In general, a calculator only needs 3 members, which are two operands and an operator, to start calculation. So we need to design the three fields here as a string to achieve serialization;

For example, we should stipulate that the serialized data should be as follows, there should be space between the two operands and the operator

a + b

Then add the identification of the data length at the beginning

Data length\tFormula\t

7\t10 + 20\t
8\t100 / 30\t
9\t300 - 200\t

For the server, we need to return two parameters: status code and result

exit status result

If the exit status is not 0, it means that an error occurred and the result is invalid; only when the exit result is 0, the result is valid.

Similarly, it is also necessary to add the length of the data to the serialized string of the server

data length\texit status result\t

In this way, a custom protocol for a calculator is completed;

3.2 Members

According to the above protocol, first write the request and returned member variables

class Request
{<!-- -->
    int_x;
    int _y;
    char _ops;
};
class Response
{<!-- -->
int _exitCode; //Exit code of computing service
int _result; // result
};

These member variables are all set as public, which is convenient for processing in the task (otherwise you need to write the get function, which is very troublesome)

At the same time, it is better to define the separator in the protocol to facilitate subsequent unified use or change

#define CRLF "\t" //delimiter
#define CRLF_LEN strlen(CRLF) //delimiter length
#define SPACE " " //space
#define SPACE_LEN strlen(SPACE) //space length

#define OPS " + -*/%" //operator

3.3 Codec

For requests and responses, the encoding and decoding operations are the same, adding length and separator to the beginning of the string

length\t serialized string\t

Decoding is to remove the length and separator, and only parse the serialized string

Serialized string

The whole process of encoding and decoding is clearly stated in the comments In order to facilitate the use of requests and responses, it is directly placed outside without encapsulation within the class

//The parameter len is the length of in, which is an output parameter. If it is 0, it means err
std::string decode(std::string & in, size_t*len)
{<!-- -->
    assert(len);//If the length is 0 is wrong
    // 1. Confirm that the serialized string of in is complete (delimiter)
    *len=0;
    size_t pos = in.find(CRLF);//Find separator
    // not found, err
    if(pos == std::string::npos){<!-- -->
        return "";//return empty string
    }
    // 2. There is a separator, to determine whether the length meets the standard
    // At this time, the pos subscript is exactly the character length of the logo size
    std::string inLenStr = in.substr(0,pos);//Extract the string length
    size_t inLen = atoi(inLenStr.c_str());//turn to int
    size_t left = in.size() - inLenStr.size()- 2*CRLF_LEN;//The remaining character length
    if(left<inLen){<!-- -->
        return ""; //The remaining length does not reach the marked length
    }
    // 3. Come here, the string is complete, start to extract the serialized string
    std::string ret = in.substr(pos + CRLF_LEN,inLen);
    *len = inLen;
    // 4. Because there may be other messages in in (next item)
    // So you need to delete the current message from in to facilitate the next decode and avoid secondary reading
    size_t rmLen = inLenStr.size() + ret.size() + 2*CRLF_LEN;
    in.erase(0,rmLen);
    // 5. return
    return ret;
}

//Encoding does not need to modify the source string, so const. The parameter len is the length of in
std::string encode(const std::string & in, size_t len)
{<!-- -->
    std::string ret = std::to_string(len);//Convert the length to a string and add it at the front as an identifier
    ret + =CRLF;
    ret + =in;
    ret + =CRLF;
    return ret;
}

3.4 request

Encoding and decoding are written, let’s deal with the more troublesome request part first; let’s talk about trouble, in fact, most of them are also string operations of C ++, and it is necessary to skillfully use various member functions of string to realize it well

3.4.1 Construction

The more important thing is this constructor, we need to convert the user’s input into three internal members

The user may enter x + y, x + y, x + y, x + y, etc. format

It should also be noted here that the user’s input is not necessarily the standard X + Y, there may be spaces in different positions. For unified and convenient processing, before parsing, it is best to remove the spaces in the user input!

For string, it is very simple to remove spaces, and it can be done directly by one traversal

 // remove spaces from the input
    void rmSpace(std::string & in)
    {<!-- -->
        std::string tmp;
        for(auto e:in)
        {<!-- -->
            if(e!='')
            {<!-- -->
                tmp + =e;
            }
        }
        in = tmp;
    }

The completed structure is as follows, which involves the function strtok of C language, which needs to be reviewed

 // Convert user input to internal members
    // The user may enter x + y, x + y, x + y, x + y, etc. format
    // Modify user input in advance (mainly remove spaces), extract members
    Request(std::string in, bool* status)
        :_x(0),_y(0),_ops('')
    {<!-- -->
        rmSpace(in);
        // Use the string of c here, because there is strtok
        char buf[1024];
        // Print n characters, more will be truncated
        snprintf(buf,sizeof(buf),"%s",in.c_str());
        char* left = strtok(buf,OPS);
        if(!left){<!-- -->//Cannot find
            *status = false;
            return;
        }
        char*right = strtok(nullptr,OPS);
        if(!right){<!-- -->//Cannot find
            *status = false;
            return;
        }
        // x + y, strtok will set + to \0
        char mid = in[strlen(left)];//intercept the operator
        //This is taken out of the original string, and this position in buf has been changed to \0

        _x = atoi(left);
        _y = atoi(right);
        _ops = mid;
        *status=true;
    }

3.4.2 Serialization

After parsing out the members, what we have to do is to serialize the members and put them into a string according to the specified position. Here, the method of output parameter is used to serialize the string, and it can also be changed to use the method of return value to operate.

It should be noted here that the operator itself is char and cannot be operated with to_string, it will be converted into ascii code, which does not meet our needs

// serialization (the input parameter should be empty)
void serialize(std::string & out)
{<!-- -->
    // x + y
    out.clear(); // serialized input parameter is empty
    out += std::to_string(_x);
    out + = SPACE;
    out + = _ops;//The operator cannot use tostring, it will be converted to ascii
    out + = SPACE;
    out += std::to_string(_y);
    // No need to add separators (this is what encode does)
}

3.4.3 Deserialization

Note that the train of thought cannot be mistaken. At first, I thought that the deserialization of request should be aimed at the return value of the server, but actually this is not the case!

Both the client and the server need to use request, the client performs serialization, and the server uses request to deserialize the received results. request only focuses on the processing of the request, not the return value of the server.

// Deserialization
bool deserialize(const std::string &in)
{<!-- -->
    // x + y needs to take out x, y and the operator
    size_t space1 = in.find(SPACE); //the first space
    if(space1 == std::string::npos)
    {<!-- -->
        return false;
    }
    size_t space2 = in.rfind(SPACE); //the second space
    if(space2 == std::string::npos)
    {<!-- -->
        return false;
    }
    // Both spaces exist, start fetching data
    std::string dataX = in.substr(0,space1);
    std::string dataY = in.substr(space2 + SPACE_LEN);//default to the end
    std::string op = in.substr(space1 + SPACE_LEN,space2 -(space1 + SPACE_LEN));
    if(op. size()!=1)
    {<!-- -->
        return false;//There is a problem with the length of the operator
    }

    //No problem, transfer to internal member
    _x = atoi(dataX.c_str());
    _y = atoi(dataY.c_str());
    _ops = op[0];
    return true;
}

3.5 response

3.5.1 Structure

The structure of the return value is relatively simple, because it is an operation after the server processes the result; these member variables are set as public, which is convenient for subsequent modification.

 Response(int code=0,int result=0)
        :_exitCode(code),_result(result)
    {<!-- -->}

3.5.2 Serialization

// input parameter is empty
void serialize(std::string & out)
{<!-- -->
    // code return
    out. clear();
    out += std::to_string(_exitCode);
    out + = SPACE;
    out += std::to_string(_result);
    out += CRLF;
}

3.5.3 Deserialization

The deserialization of the response only needs to deal with a space, which is relatively simple

// Deserialization
bool deserialize(const std::string &in)
{<!-- -->
    // only one space
    size_t space = in. find(SPACE);
    if(space == std::string::npos)
    {<!-- -->
        return false;
    }

    std::string dataCode = in.substr(0,space);
    std::string dataRes = in.substr(space + SPACE_LEN);
    _exitCode = atoi(dataCode.c_str());
    _result = atoi(dataRes.c_str());
    return true;
}

3.6 Client

The client written before did not perform serialization operations, so we need to add serialization operations and deserialize the return value of the server. A series of judgments need to be added during this period;

In order to limit the space, only the loop operation of the client is posted below; refer to the comments for details.

// The message found by the client
string message;
while (1)
{<!-- -->
    message.clear();//Every time the loop starts, clear the msg
    cout << "Please enter your message# ";
    getline(cin, message);//get input
    // If the client enters quit, exit
    if (strcasecmp(message.c_str(), "quit") == 0)
        break;
    // Send a message to the server

    // 1. Create a request (separate parameters)
    bool reqStatus = true;
    Request req(message, &reqStatus);
    if(!reqStatus){<!-- -->
        cout << "make req err!" << endl;
        continue;
    }
    // 2. Serialization and encoding
    string package;
    req.serialize(package);//serialization
    package = encode(package,package.size());//encode
    // 3. Send to the server
    ssize_t s = write(sock, package.c_str(), package.size());
    if (s > 0) // write succeeded
    {<!-- -->
        // 4. Get the result from the server
        char buff[BUFFER_SIZE];
        size_t s = read(sock, buff, sizeof(buff)-1);
        if(s > 0){<!-- -->
            buff[s] = '\0';
        }
        std::string echoPackage = buff;
        Response resp;
        size_t len = 0;
        // 5. Decoding and deserialization
        std::string tmp = decode(echoPackage, &len);
        if(len > 0)//Decoding is successful
        {<!-- -->
            echoPackage = tmp;
            if(resp.deserialize(echoPackage))//Deserialize and judge
            {<!-- -->
                printf("ECHO [exitcode: %d] %d\\
", resp._exitCode, resp._result);
            }
            else
            {<!-- -->
                cerr << "server echo deserialize err!" << endl;
            }
        }
        else
        {<!-- -->
            cerr << "server echo decode err!" << endl;
        }
    }
    else if (s <= 0) // write failed
    {<!-- -->
        break;
    }
}

3.7 Server

The server does not need to modify the code, what needs to be modified is the task processed in the task message queue; this is the benefit of the previous encapsulation, because only the function pointer passed in the task needs to be modified, even if the service performed by the server is modified

// Provide service (through thread pool)
Task t(conet,senderIP,senderPort,CaculateService);
_tpool->push(t);

The following is the code of the calculator service

void CaculateService(int sockfd, const std::string & clientIP, uint16_t clientPort)
{<!-- -->
    assert(sockfd >= 0);
    assert(!clientIP.empty());
    assert(clientPort > 0);

    std::string inbuf;
    while(1)
    {<!-- -->
        Request req;
        char buf[BUFFER_SIZE];
        // 1. Read the information sent by the client
        ssize_t s = read(sockfd, buf, sizeof(buf) - 1);
        if (s == 0)
        {<!-- --> // s == 0 means that the other party sent an empty message, which is regarded as the client actively exiting
            logging(DEBUG, "client quit: %s[%d]", clientIP.c_str(), clientPort);
            break;
        }
        else if(s<0)
        {<!-- -->
            // A read error occurred, disconnect after printing the log
            logging(DEBUG, "read err: %s[%d] = %s", clientIP.c_str(), clientPort, strerror(errno));
            break;
        }
        // 2. Read successfully
        buf[s] = '\0'; // manually add string terminator
        if (strcasecmp(buf, "quit") == 0)
        {<!-- --> // The client actively exits
            break;
        }
        // 3. Start the service
        inbuf = buf;
        size_t packageLen = inbuf. size();
        // 3.1. Decode and deserialize the message from the client
        std::string package = decode(inbuf, & amp;packageLen);//decode
        if(packageLen==0){<!-- -->
            logging(DEBUG, "decode err: %s[%d] status: %d", clientIP.c_str(), clientPort, packageLen);
            continue;//The message is incomplete or wrong
        }
        logging(DEBUG,"package: %s[%d] = %s",clientIP.c_str(), clientPort,package.c_str());
        bool deStatus = req.deserialize(package); // deserialize
        if(deStatus) // Obtain message deserialization success
        {<!-- -->
            req.debug(); // print information
            // 3.2. Get the structured response
            Response resp = Caculater(req);
            // 3.3. Serialize and encode the response
            std::string echoStr;
            resp. serialize(echoStr);
            echoStr = encode(echoStr, echoStr. size());
            // 3.4. Write, send the return value to the client
            write(sockfd, echoStr. c_str(), echoStr. size());
        }
        else // Client message deserialization failed
        {<!-- -->
            logging(DEBUG, "deserialize err: %s[%d] status: %d", clientIP.c_str(), clientPort, deStatus);
            continue;
        }
    }
    close(sockfd);
    logging(DEBUG, "server quit: %s[%d] %d",clientIP.c_str(), clientPort, sockfd);
}

Among them is a calculation function, which is relatively simple. Through the switch case statement, calculate the result and judge whether there is a problem with the operand.

Response Caculater(const Request & amp; req)
{<!-- -->
    Response resp;//The exitcode has been specified as 0 in the constructor
    switch (req._ops)
    {<!-- -->
    case ' + ':
        resp._result = req._x + req._y;
        break;
    case '-':
        resp._result = req._x - req._y;
        break;
    case '*':
        resp._result = req._x * req._y;
        break;
    case '%':
    {<!-- -->
        if(req._y == 0)
        {<!-- -->
            resp._exitCode = -1;//modulo error
            break;
        }
        resp._result = req._x % req._y;//Moulding can operate negative numbers
        break;
    }
    case '/':
    {<!-- -->
        if(req._y == 0)
        {<!-- -->
            resp._exitCode = -2;//except 0 error
            break;
        }
        resp._result = req._x / req._y;//Moulding can operate negative numbers
        break;
    }
    default:
        resp._exitCode = -3;//Illegal operator
        break;
    }

    return resp;
}

In this way, our serialization process is successful! test it

4. Test

Run the server, you can see that the server can successfully process the calculation of the client and return the result

image-20230212124940995

Enter quit, the server will print the information and exit the service

image-20230212125238473