This article was first published in Mu Xue’s humble house
Take the calculator service of tcpServer as an example to implement a custom protocol
Before reading this article, please read tcpServer
For the complete code of this article, see Gitee
1. Re-discuss tcp
Note that the current description of tcp is for the sake of simplicity and easy understanding, and the tcp protocol will be further interpreted later
1.1 Links
We know that tcp is connection-oriented, and the client and server must first establish a link before they can start communicating
- During the connection process, tcp uses a three-way handshake
- During the disconnection process, tcp uses four waves
Give a chestnut in daily life to help understand the 3 handshakes and 4 waved hands
1.2 Sending information
What if we now need to send structured data?
We know that tcp is byte-oriented, that is, it can send arbitrary data. It can also send binary data of C language structure;
- But being able to send it means we can do it?
- The answer is naturally No!
Different platforms have different configurations for structure alignment and different endianness, so the final analysis of our byte stream is also different. If the method of directly sending structure data is used to communicate, the adaptability is extremely low, and our client and server will be limited to run in the current system environment;
However, even if it is the same system, its internal configuration of big and small endian may change! At that time, our code may not be able to run!
In the same way, when writing the code of the C language address book, the method of directly writing the structure data to the file cannot be used. Subsequent code upgrades and environment changes, may cause the data in our stored files to become invalid, which is definitely something we don’t want to see.
Therefore, in order to solve this problem, we should serialize
the data before sending it. After the client receives the information, it will deserialize
and parse the data!
2. Serialization and deserialization
2.1 Introduction
The so-called serialization is to convert structured data (which can be temporarily understood as the structure of c) into strings and send them out
struct date {<!-- --> int year; int month; int day; };
For example, the date structure above, if we want to serialize it, we can splice it into a string in a very simple way (serialization)
year-month-day
After the client receives this string, it can find the separator -
, take out three variables, convert them into int and store them back into the structure (deserialization)
In this way, even if we stipulate a serialization and deserialization method, it is a simple protocol!
2.2 Codec
There will be another problem here, how do I know that I have read a serialized data?
2000-12-10 10000-01-01
As above, suppose that one day, our year becomes a five-digit number; at this time, how does the server know whether it has read a complete serialized data?
This requires us to make a good rule, use the first n bytes as the data of the identification length. After receiving the data, first take out the first n bytes, read the length m of this message, and then read the data of m bytes later, and successfully take out the complete string;
- This process can be called the process of encoding and decoding
In order to distinguish the data of the identified length from the actual serialized content, we can add the separator \t
; but this also requires us to confirm that the transmitted data itself cannot carry \t
, otherwise there will be a series of problems
10\t2000-12-10\t 11\t10000-01-01\t
The above series of tasks are all part of protocol customization
! We have specified a serialization and deserialization method for the server and client, so that the communication between the two can avoid the restrictions of the platform. After all, the data decoded by any platform to string will be the same!
Let’s use a calculator service to demonstrate it
3. Calculator service
Because the focus of this article is the demonstration of protocol customization, the calculator here does not consider the case of continuous operators,
3.1 Protocol customization
To implement a calculator, we must first understand how many members the calculator has
x + y x/y x*y ...
In general, a calculator only needs 3 members, which are two operands and an operator, to start calculation. So we need to design the three fields here as a string to achieve serialization;
For example, we should stipulate that the serialized data should be as follows, there should be space between the two operands and the operator
a + b
Then add the identification of the data length at the beginning
Data length\tFormula\t 7\t10 + 20\t 8\t100 / 30\t 9\t300 - 200\t
For the server, we need to return two parameters: status code and result
exit status result
If the exit status is not 0, it means that an error occurred and the result is invalid; only when the exit result is 0, the result is valid.
Similarly, it is also necessary to add the length of the data to the serialized string of the server
data length\texit status result\t
In this way, a custom protocol for a calculator is completed;
3.2 Members
According to the above protocol, first write the request and returned member variables
class Request {<!-- --> int_x; int _y; char _ops; };
class Response {<!-- --> int _exitCode; //Exit code of computing service int _result; // result };
These member variables are all set as public, which is convenient for processing in the task (otherwise you need to write the get function, which is very troublesome)
At the same time, it is better to define the separator in the protocol to facilitate subsequent unified use or change
#define CRLF "\t" //delimiter #define CRLF_LEN strlen(CRLF) //delimiter length #define SPACE " " //space #define SPACE_LEN strlen(SPACE) //space length #define OPS " + -*/%" //operator
3.3 Codec
For requests and responses, the encoding and decoding operations are the same, adding length and separator to the beginning of the string
length\t serialized string\t
Decoding is to remove the length and separator, and only parse the serialized string
Serialized string
The whole process of encoding and decoding is clearly stated in the comments In order to facilitate the use of requests and responses, it is directly placed outside without encapsulation within the class
//The parameter len is the length of in, which is an output parameter. If it is 0, it means err std::string decode(std::string & in, size_t*len) {<!-- --> assert(len);//If the length is 0 is wrong // 1. Confirm that the serialized string of in is complete (delimiter) *len=0; size_t pos = in.find(CRLF);//Find separator // not found, err if(pos == std::string::npos){<!-- --> return "";//return empty string } // 2. There is a separator, to determine whether the length meets the standard // At this time, the pos subscript is exactly the character length of the logo size std::string inLenStr = in.substr(0,pos);//Extract the string length size_t inLen = atoi(inLenStr.c_str());//turn to int size_t left = in.size() - inLenStr.size()- 2*CRLF_LEN;//The remaining character length if(left<inLen){<!-- --> return ""; //The remaining length does not reach the marked length } // 3. Come here, the string is complete, start to extract the serialized string std::string ret = in.substr(pos + CRLF_LEN,inLen); *len = inLen; // 4. Because there may be other messages in in (next item) // So you need to delete the current message from in to facilitate the next decode and avoid secondary reading size_t rmLen = inLenStr.size() + ret.size() + 2*CRLF_LEN; in.erase(0,rmLen); // 5. return return ret; } //Encoding does not need to modify the source string, so const. The parameter len is the length of in std::string encode(const std::string & in, size_t len) {<!-- --> std::string ret = std::to_string(len);//Convert the length to a string and add it at the front as an identifier ret + =CRLF; ret + =in; ret + =CRLF; return ret; }
3.4 request
Encoding and decoding are written, let’s deal with the more troublesome request part first; let’s talk about trouble, in fact, most of them are also string operations of C ++, and it is necessary to skillfully use various member functions of string to realize it well
3.4.1 Construction
The more important thing is this constructor, we need to convert the user’s input into three internal members
The user may enter x + y, x + y, x + y, x + y, etc. format
It should also be noted here that the user’s input is not necessarily the standard X + Y
, there may be spaces in different positions. For unified and convenient processing, before parsing, it is best to remove the spaces in the user input!
For string, it is very simple to remove spaces, and it can be done directly by one traversal
// remove spaces from the input void rmSpace(std::string & in) {<!-- --> std::string tmp; for(auto e:in) {<!-- --> if(e!='') {<!-- --> tmp + =e; } } in = tmp; }
The completed structure is as follows, which involves the function strtok of C language, which needs to be reviewed
// Convert user input to internal members // The user may enter x + y, x + y, x + y, x + y, etc. format // Modify user input in advance (mainly remove spaces), extract members Request(std::string in, bool* status) :_x(0),_y(0),_ops('') {<!-- --> rmSpace(in); // Use the string of c here, because there is strtok char buf[1024]; // Print n characters, more will be truncated snprintf(buf,sizeof(buf),"%s",in.c_str()); char* left = strtok(buf,OPS); if(!left){<!-- -->//Cannot find *status = false; return; } char*right = strtok(nullptr,OPS); if(!right){<!-- -->//Cannot find *status = false; return; } // x + y, strtok will set + to \0 char mid = in[strlen(left)];//intercept the operator //This is taken out of the original string, and this position in buf has been changed to \0 _x = atoi(left); _y = atoi(right); _ops = mid; *status=true; }
3.4.2 Serialization
After parsing out the members, what we have to do is to serialize the members and put them into a string according to the specified position. Here, the method of output parameter is used to serialize the string, and it can also be changed to use the method of return value to operate.
It should be noted here that the operator itself is char
and cannot be operated with to_string
, it will be converted into ascii
code, which does not meet our needs
// serialization (the input parameter should be empty) void serialize(std::string & out) {<!-- --> // x + y out.clear(); // serialized input parameter is empty out += std::to_string(_x); out + = SPACE; out + = _ops;//The operator cannot use tostring, it will be converted to ascii out + = SPACE; out += std::to_string(_y); // No need to add separators (this is what encode does) }
3.4.3 Deserialization
Note that the train of thought cannot be mistaken. At first, I thought that the deserialization of request
should be aimed at the return value of the server, but actually this is not the case!
Both the client and the server need to use request
, the client performs serialization, and the server uses request
to deserialize the received results. request only focuses on the processing of the request, not the return value of the server.
// Deserialization bool deserialize(const std::string &in) {<!-- --> // x + y needs to take out x, y and the operator size_t space1 = in.find(SPACE); //the first space if(space1 == std::string::npos) {<!-- --> return false; } size_t space2 = in.rfind(SPACE); //the second space if(space2 == std::string::npos) {<!-- --> return false; } // Both spaces exist, start fetching data std::string dataX = in.substr(0,space1); std::string dataY = in.substr(space2 + SPACE_LEN);//default to the end std::string op = in.substr(space1 + SPACE_LEN,space2 -(space1 + SPACE_LEN)); if(op. size()!=1) {<!-- --> return false;//There is a problem with the length of the operator } //No problem, transfer to internal member _x = atoi(dataX.c_str()); _y = atoi(dataY.c_str()); _ops = op[0]; return true; }
3.5 response
3.5.1 Structure
The structure of the return value is relatively simple, because it is an operation after the server processes the result; these member variables are set as public, which is convenient for subsequent modification.
Response(int code=0,int result=0) :_exitCode(code),_result(result) {<!-- -->}
3.5.2 Serialization
// input parameter is empty void serialize(std::string & out) {<!-- --> // code return out. clear(); out += std::to_string(_exitCode); out + = SPACE; out += std::to_string(_result); out += CRLF; }
3.5.3 Deserialization
The deserialization of the response only needs to deal with a space, which is relatively simple
// Deserialization bool deserialize(const std::string &in) {<!-- --> // only one space size_t space = in. find(SPACE); if(space == std::string::npos) {<!-- --> return false; } std::string dataCode = in.substr(0,space); std::string dataRes = in.substr(space + SPACE_LEN); _exitCode = atoi(dataCode.c_str()); _result = atoi(dataRes.c_str()); return true; }
3.6 Client
The client written before did not perform serialization operations, so we need to add serialization operations and deserialize the return value of the server. A series of judgments need to be added during this period;
In order to limit the space, only the loop operation of the client is posted below; refer to the comments for details.
// The message found by the client string message; while (1) {<!-- --> message.clear();//Every time the loop starts, clear the msg cout << "Please enter your message# "; getline(cin, message);//get input // If the client enters quit, exit if (strcasecmp(message.c_str(), "quit") == 0) break; // Send a message to the server // 1. Create a request (separate parameters) bool reqStatus = true; Request req(message, &reqStatus); if(!reqStatus){<!-- --> cout << "make req err!" << endl; continue; } // 2. Serialization and encoding string package; req.serialize(package);//serialization package = encode(package,package.size());//encode // 3. Send to the server ssize_t s = write(sock, package.c_str(), package.size()); if (s > 0) // write succeeded {<!-- --> // 4. Get the result from the server char buff[BUFFER_SIZE]; size_t s = read(sock, buff, sizeof(buff)-1); if(s > 0){<!-- --> buff[s] = '\0'; } std::string echoPackage = buff; Response resp; size_t len = 0; // 5. Decoding and deserialization std::string tmp = decode(echoPackage, &len); if(len > 0)//Decoding is successful {<!-- --> echoPackage = tmp; if(resp.deserialize(echoPackage))//Deserialize and judge {<!-- --> printf("ECHO [exitcode: %d] %d\\ ", resp._exitCode, resp._result); } else {<!-- --> cerr << "server echo deserialize err!" << endl; } } else {<!-- --> cerr << "server echo decode err!" << endl; } } else if (s <= 0) // write failed {<!-- --> break; } }
3.7 Server
The server does not need to modify the code, what needs to be modified is the task processed in the task message queue; this is the benefit of the previous encapsulation, because only the function pointer passed in the task needs to be modified, even if the service performed by the server is modified
// Provide service (through thread pool) Task t(conet,senderIP,senderPort,CaculateService); _tpool->push(t);
The following is the code of the calculator service
void CaculateService(int sockfd, const std::string & clientIP, uint16_t clientPort) {<!-- --> assert(sockfd >= 0); assert(!clientIP.empty()); assert(clientPort > 0); std::string inbuf; while(1) {<!-- --> Request req; char buf[BUFFER_SIZE]; // 1. Read the information sent by the client ssize_t s = read(sockfd, buf, sizeof(buf) - 1); if (s == 0) {<!-- --> // s == 0 means that the other party sent an empty message, which is regarded as the client actively exiting logging(DEBUG, "client quit: %s[%d]", clientIP.c_str(), clientPort); break; } else if(s<0) {<!-- --> // A read error occurred, disconnect after printing the log logging(DEBUG, "read err: %s[%d] = %s", clientIP.c_str(), clientPort, strerror(errno)); break; } // 2. Read successfully buf[s] = '\0'; // manually add string terminator if (strcasecmp(buf, "quit") == 0) {<!-- --> // The client actively exits break; } // 3. Start the service inbuf = buf; size_t packageLen = inbuf. size(); // 3.1. Decode and deserialize the message from the client std::string package = decode(inbuf, & amp;packageLen);//decode if(packageLen==0){<!-- --> logging(DEBUG, "decode err: %s[%d] status: %d", clientIP.c_str(), clientPort, packageLen); continue;//The message is incomplete or wrong } logging(DEBUG,"package: %s[%d] = %s",clientIP.c_str(), clientPort,package.c_str()); bool deStatus = req.deserialize(package); // deserialize if(deStatus) // Obtain message deserialization success {<!-- --> req.debug(); // print information // 3.2. Get the structured response Response resp = Caculater(req); // 3.3. Serialize and encode the response std::string echoStr; resp. serialize(echoStr); echoStr = encode(echoStr, echoStr. size()); // 3.4. Write, send the return value to the client write(sockfd, echoStr. c_str(), echoStr. size()); } else // Client message deserialization failed {<!-- --> logging(DEBUG, "deserialize err: %s[%d] status: %d", clientIP.c_str(), clientPort, deStatus); continue; } } close(sockfd); logging(DEBUG, "server quit: %s[%d] %d",clientIP.c_str(), clientPort, sockfd); }
Among them is a calculation function, which is relatively simple. Through the switch case statement, calculate the result and judge whether there is a problem with the operand.
Response Caculater(const Request & amp; req) {<!-- --> Response resp;//The exitcode has been specified as 0 in the constructor switch (req._ops) {<!-- --> case ' + ': resp._result = req._x + req._y; break; case '-': resp._result = req._x - req._y; break; case '*': resp._result = req._x * req._y; break; case '%': {<!-- --> if(req._y == 0) {<!-- --> resp._exitCode = -1;//modulo error break; } resp._result = req._x % req._y;//Moulding can operate negative numbers break; } case '/': {<!-- --> if(req._y == 0) {<!-- --> resp._exitCode = -2;//except 0 error break; } resp._result = req._x / req._y;//Moulding can operate negative numbers break; } default: resp._exitCode = -3;//Illegal operator break; } return resp; }
In this way, our serialization process is successful! test it
4. Test
Run the server, you can see that the server can successfully process the calculation of the client and return the result
Enter quit, the server will print the information and exit the service