How to solve the statelessness of HTTP protocol?

1. The meaning of stateless HTTP protocol

1.1 Stateful Protocol

Many common seven-layer protocols are actually stateful, such as the SMTP protocol. Its first message must be HELO, used for handshaking. No other commands can be sent before HELO is sent; AUTH is generally required next. stage is used to verify the user name and password; then the email data can be sent; finally, exit through the QUIT command. It can be seen that on the entire transport layer, both parties in communication must always remember the current connection status, because the commands accepted in different states are different; in addition, some data transmitted by previous commands must also be Keep in mind that this may affect subsequent commands. This is called a stateful protocol.

1.2 Why is the http protocol a stateless protocol

On the contrary, why is HTTP a stateless protocol? Because each of its requests is completely independent, each request contains the complete data required to process the request, and sending the request does not involve state changes. Even on HTTP/1.1, when the same connection allows the transmission of multiple HTTP requests, if the first request fails, subsequent requests can generally continue to be processed (of course, if the protocol parsing fails or the message fragmentation error occurs, Naturally, the class should be excluded). It can be seen that the structure of this protocol is simpler than the stateful protocol, and generally speaking it is simpler to implement. There is no need to use a state machine, just a loop.

1.3 Why not improve the http protocol to make it stateful

The original http protocol was only used to browse static files. The stateless protocol is enough, and the burden of implementation is also very light (relatively speaking, the cost of implementing stateful is very high. You need to maintain the state and operate based on the state.) . As the web develops, it needs to become stateful, but isn’t it necessary to modify the http protocol to make it stateful? It’s not needed. Because we often stay on a certain web page for a long time before entering another web page, the cost of maintaining state between these two pages is very high. Secondly, history has made http stateless, but now new requirements are put forward for http. According to the usual practice in the software field, we retain historical experience and add another layer to the http protocol to achieve our goals (“Add another layer , you can do anything”). So other mechanisms were introduced to achieve this stateful connection.

1.4 Advantages and disadvantages of stateless protocols

Contrary to what many people imagine, session support is not actually a disadvantage, but an advantage of stateless protocols, because for stateful protocols, if the session state is bound to the connection, then if the connection is accidentally disconnected, If opened, the entire session will be lost, and you generally need to start from scratch after reconnecting (of course this can also be improved by absorbing some characteristics of stateless protocols); and stateless protocols such as HTTP use metadata (such as Cookies header) to Maintain the session so that the session is independent of the connection itself, so that even if the connection is disconnected, the session state will not be seriously damaged, and maintaining the session does not require maintaining the connection itself. In addition, the advantage of stateless is that it is friendly to middleware. The middleware does not need to fully understand the interaction process between the communicating parties. It only needs to be able to correctly fragment the message, and the middleware can easily transmit the message on different connections. Correctness is not affected, which facilitates the design of components such as load balancing.

The main disadvantage of the stateless protocol is that all the information required for a single request must be included in the request and sent to the server at once. This results in the structure of a single message being more complex and must be able to support a large amount of metadata. Therefore, the parsing of HTTP messages requires Much more complex than many other protocols. At the same time, this also results in the same data often needing to be transmitted repeatedly on multiple requests. For example, each request on the same connection needs to transmit Host, Authentication, Cookies, Server, etc., which are often completely repeated metadata, to a certain extent. Reduces the efficiency of the protocol.

1.5 The HTTP protocol is a stateless protocol. Is this statement true?

Actually, not quite. There is an Expect: 100-Continue function in HTTP/1.1, which works like this:

  1. When sending a large amount of data, considering that the server may directly reject the data, the client sends a request header with the HTTP header of Expect: 100-Continue, does not send the request body, and waits for the server to respond first.
  2. The server receives an Expect: 100-Continue request. If the upload is allowed, it will send an HTTP response of 100 Continue (the same request can have any number of 1xx responses, none of which are the last Response and only serve as a reminder); if the upload is not allowed, , for example, uploading data is not allowed, or the data size exceeds the limit, a 4xx/5xx error will be returned directly.
  3. After the client receives the 100 Continue response, it continues to upload data.

It can be seen that this is actually a routine of a stateful protocol, which requires a handshake before actually sending data. However, the HTTP protocol also stipulates that if the server does not respond with 100 Continue, it is recommended that the client still upload data after waiting for a short period of time to achieve compatibility with servers that do not support the Expect: 100-Continue function. This can be regarded as “If you can have state, you have state, otherwise you will return to the stateless road.” It is correct to say that HTTP 1.x is a stateless protocol.

As for HTTP/2, it should be regarded as a stateful protocol (with handshakes and GOAWAY messages, and flow control similar to TCP), so it will not be right to say “HTTP is a stateless protocol” in the future. It is better to say “HTTP 1.x is a stateless protocol”

2. How to solve the stateless problem

The HTTP protocol is stateless, which means that the server cannot respond to different information from different clients. In this way, some interactive services cannot be supported. Cookies came into being.

2.1 Cookie

The delivery of cookies will go through the following 4 steps:

  1. Client sends HTTP request to Server
  2. Server responds with Set-Cookie header information
  3. The client saves the cookie, and then requests the server to include the cookie’s header information.
  4. The server knows who the client is from the cookie and returns the corresponding response.

The English translation of cookie is dessert. Using cookies can automatically fill in user names, remember passwords, etc., which is a little sweetness for users.

After the Server gets the Cookie, what information can it use to determine which Client it is? The server’s SessionID.

2.2 Session

If important privacy such as username and password are stored in the client’s cookies, there is still a risk of leakage. For greater security, the confidential information is saved on the server. This is the Session. Session is a customer file maintained on the server. It can be understood as a user table in the server-side database, which stores the client’s user information. SessionID is the primary key ID of this table.

Session information is stored in the server, which will inevitably occupy memory. When there are more users, the overhead will inevitably increase. In order to improve efficiency, it needs to be distributed and load balanced. Because the authentication information is stored in memory, whichever server the user accesses, he or she has to access the same server next time to get the authorization information, which limits the load balancing capability. Moreover, there is a cookie in SeesionID, and there is still a risk of exposure, such as CSRF (Cross-Site Request Forgery, cross-site request forgery).

How to solve these problems? Token-based authentication.

2.3 Token

First of all, Token no longer needs to store user information, saving memory. Secondly, since no information is stored, clients can authenticate when accessing different servers, which enhances scalability. Tokens can then be signed using different encryption methods, improving security.

Token is a string. The process of passing Token is similar to that of Cookie, except that the passed object becomes Token. After the user uses the user name and password to request the server, the server generates a Token and returns it to the client in the response. When the client requests again, the Token is attached, and the server uses this Token for authentication.

Although Token solves the problem of Session very well, it is still not perfect. When the server authenticates the Token, it still needs to go to the database to query the authentication information for verification. In order to directly authenticate without checking the database, JWT appeared.

2.4 JWT

The full English name of JWT is JSON Web Token. JWT stores all information on itself, including username, password, encrypted information, etc., and stores it as a JSON object.

JWT looks like xxxxx.yyyyy.zzzzz, which is very artistic. Includes three parts

  • Header includes token type and encryption algorithm (HMAC SHA256 RSA)
{ "alg": "HS256", "typ": "JWT"}
  • Payload

Incoming content

{ "sub": "1234567890", "name": "John Doe", "admin": true}
  • Signature

To sign, encode the header and payload with base64 and concatenate them with “.”, then add salt secret (server private key).

HMACSHA256(base64UrlEncode(header) + "." + base64UrlEncode(payload), secret);

The final token is such a string

eyJhbGciOiJIUzI1NiJ9
  .eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ
  .yKOB4jkGWu7twu8Ts9zju01E10_CPedLJkoJFCan5J4;

Put on a coat for Token


This is the content format we see in the request header.