An article explains the difference between Token, Cookie and Session

Last week, we used jwt (Json Web Token) token, a no-session method, for user account verification for the first time within the team. We found that many articles on the Internet introduced token incorrectly, so we made some changes to cookies, sessions, and tokens. Let’s compare it (token in the article refers to jwt token) I believe everyone will gain something after reading it!

Cookie

HTTP 0.9 was born in 1991. At that time, it was just to meet everyone’s requirements for browsing web documents, so there were only GET requests. After browsing, there was no connection between the two connections. This is also the reason why HTTP is stateless. There was no such need when it was born.

But with the rise of the interactive Web (the so-called interactive means that you can not only browse, but also log in, post comments, shop and other user operations), simply browsing the web can no longer meet people’s requirements. For example, with the rise of online shopping, In order to record the user’s shopping cart record, there needs to be a mechanism to record the relationship of each connection, so that we know who the products added to the shopping cart belong to, so Cookie was born.

Cookies, sometimes also used in the plural form Cookies. The type is “small text file”, which is data (usually encrypted) stored on the user’s local terminal by some websites in order to identify the user’s identity and conduct session tracking. The information is temporarily or permanently saved by the user’s client computer.

The working mechanism is as follows

Take adding to the shopping cart as an example. After each browser request, the server will store the product ID in a cookie and return it to the client. The client will save the cookie locally, and next time it will pass the cookie that was last saved locally. Just give it to the server, so that each cookie saves the user’s product ID and the purchase record will not be lost.

If you look carefully at the picture above, I believe you can easily find that as there are more and more items in the shopping cart, the cookie size for each request is getting larger and larger. This is a big burden for each request. I just want to When a product is added to the purchase cart, why should the historical product records also be returned to the server? The shopping cart information has actually been recorded on the server. Isn’t this operation by the browser unnecessary? How to improve it

Session

If you think about it carefully, since the user’s shopping cart information will be stored in the server, you only need to save the information that can identify the user’s identity in the cookie and know who initiated the add-to-shopping cart operation. In this way, you only need to include in the cookie after each request. The user’s identity information is included in the request body, and the request body only needs to bring the ID of the product added to the shopping cart this time, which greatly reduces the size of the cookie. We call this mechanism that can identify which request is initiated by which user called Session. ), the generated string that can identify the user’s identity information is called sessionId. Its working mechanism is as follows

First, when the user logs in, the server will generate a session for the user and assign it a unique sessionId. This sessionId is bound to a certain user, which means that based on this sessionid (assumed to be abc), you can query which user it is. Then pass this sessionid to the browser through cookie
After that, every time the browser adds a shopping cart request, just put the key-value pair sessionId=abc in the cookie. After the server finds its corresponding user based on the sessionId, it saves the passed product ID to the corresponding user in the server. Shopping cart

You can see that in this way, you no longer need to pass all the shopping cart product IDs in cookies, which greatly reduces the request burden!

In addition, it is not difficult to observe from the above that cookies are stored in the client, and sessions are stored in the server. The sessionId needs to be passed through the cookie to be meaningful.

Pain points of session

It seems that the problem is solved through cookie + session, but we have overlooked one problem. The above situation can work normally because we assume that the server works on a stand-alone machine. However, in actual production, in order to ensure high availability, generally the server needs at least The two machines use load balancing to determine which machine the request should be sent to.

balance

As shown in the picture: After the client makes a request, the load balancer (such as Nginx) determines which machine to reach.

Assume that the login request hits machine A. Machine A generates a session and adds the sessionId to the cookie and returns it to the browser. Then the question arises: if the request hits machine B or C the next time you add a shopping cart, the session is on machine A. Generated, B and C cannot find the session at this time, then an error that cannot be added to the shopping cart will occur, and you have to log in again. What should I do at this time? There are mainly three ways

1. session copy

A generates a session and copies it to B and C, so that each machine has a copy of the session. No matter which machine the request to add the shopping cart goes to, since the session can be found, there will be no problem.

balance (1)

Although this method is feasible, its shortcomings are also obvious:

The same session is saved in multiple copies, resulting in data redundancy.
It’s okay if there are few nodes, but if there are many nodes, especially like Alibaba and WeChat, which have hundreds of millions of DAUs, tens of thousands of machines may need to be deployed. In this way, the performance consumption caused by the increase in nodes and replication will also be huge.

2. Session adhesion

This method allows each client request to only go to a fixed machine. For example, after the browser login request goes to machine A, all subsequent requests to add a shopping cart will also go to machine A. Nginx’s sticky module This method can be supported, and it supports adhesion by IP or cookie, etc. For example, the adhesion method by IP is as follows

upstream tomcats {
ip_hash;
Server 10.1.1.107:88;
Server 10.1.1.132:80;
}

In this case, after each client request reaches Nginx, as long as its IP remains unchanged, the value calculated based on the IP hash will be hit to the fixed machine, and there will be no problem of session not being found. Of course, it is not difficult to see this The shortcomings of this method are also obvious. What should I do if the corresponding machine hangs up?

3. Session sharing

This method is also a solution commonly adopted by major companies. The session is saved in middleware such as redis and memcached. When a request comes, each machine can go to these middlewares to retrieve the session.

The disadvantage is actually not difficult to find, that is, every request must go to redis to get a session, which requires an extra internal connection and consumes a little performance. In addition, in order to ensure the high availability of redis, a cluster must be built. Of course, for large companies, Redis clusters are basically deployed, so this solution can be said to be the first choice for large companies.

Token: no session!

Through the above analysis, we know that user identity positioning can be completed by sharing sessions on the server side, but it is not difficult to find that there is also a small flaw: Do I need to set up a redis cluster to implement a verification mechanism? It is true that redis is commonly used by large factories, but for small factories, their business volume may not have reached the level of using redis, so are there any other user identity verification mechanisms that do not use the server to store sessions? This is what we will introduce today. The protagonist: token.

First, the requester enters his or her user name and password, and then the server generates a token accordingly. After the client gets the token, it will be saved locally, and then the token will be included in the request header when requesting the server.

I believe everyone will find two problems after looking at the picture above.

1. The token is only stored in the browser, but not on the server. In this case, can I just get a token and pass it to the server?

Answer: The server will have a verification mechanism to verify whether the token is legal.

2. Why can’t we find userid based on sessionId like session? In this case, how can we know which user it is?

Answer: The token itself carries uid information

The first question is, how to verify the token? We can learn from the signature mechanism of HTTPS for verification. Let’s first look at the components of jwt token

You can see that token is mainly composed of three parts

header: Specifies the signature algorithm
payload: You can specify non-sensitive data such as user ID, expiration time, etc.
Signature: Signature, the server knows which signature algorithm it should use based on the header, and then uses the key to generate a signature on the head + payload based on this signature algorithm, and a token is generated.

When the server receives the token from the browser, it will first take out the header + payload in the token, generate a signature based on the key, and then compare it with the signature in the token. If successful, the signature is legal, that is, the token is legal. And you will find that our userId is stored in the payload, so after getting the token, you can get the userid directly in the payload, avoiding the overhead of getting it from redis like session

Voiceover: The header and payload actually exist in the form of base64. This step is omitted in this article for the convenience of description.

You will find that this method is really wonderful. As long as the server ensures that the key is not leaked, the generated token is safe, because if the token is forged, it will not pass the signature verification process, and the token can be judged to be illegal.

It can be seen that this method effectively avoids the disadvantage that the token must be saved on the server and implements distributed storage. However, it should be noted that once the token is generated by the server, it is valid until it expires, and the token cannot be invalidated. Unless a blacklist is set up for the token on the server, go through the blacklist before verifying the token. If it is in the blacklist, the token will be invalid. But once this is done, it means that the blacklist must be saved on the server. This is back to the session mode, wouldn’t it be nice to use session directly? Therefore, the general approach is that when the client logs out and wants to invalidate the token, it can directly remove the token locally and regenerate the token next time it logs in.

In addition, it should be noted that the token is generally placed in the Authorization custom header of the header, not in the Cookie. This is mainly to solve the problem of not being able to share cookies across domains (detailed below)

A brief summary of Cookies and Tokens

What are the limitations of cookies?

1. Cookies cannot be shared across sites. In this case, if you want to implement single sign-on (SSO) for multiple applications (multiple systems), it will be very difficult to use cookies to do what is needed (you need to use a more complex trick to achieve it. If you are interested, you can see the reference link at the end of the article)

Voiceover: The so-called single sign-on means that in multiple application systems, users only need to log in once to access all mutually trusted application systems.

But if you use token to implement SSO, it will be very simple, as follows

Just add token to the authorize field (or other customization) in the header to complete the authentication of all cross-domain sites.

2. There is no cookie in native requests on the mobile terminal, and sessionid depends on cookie. Sessionid cannot be passed by cookie. If token is used, it will not exist because it is passed along with the authorize of the header. This problem, in other words, token naturally supports mobile platforms and has good scalability.

To sum up, token has the characteristics of simple storage implementation and good scalability.

What are the disadvantages of tokens?

Then someone asked, since tokens are so good, why do almost all major companies use shared sessions? This may be the first time for many people to hear tokens. Isn’t tokens good? Tokens have the following two disadvantages:

1. The token is too long

Token is the encoded style of header and payload, so it is generally much longer than sessionId, and is likely to exceed the cookie size limit (cookies generally have size limits, such as 4kb). If the information you store in the token is longer, then The token itself will also be longer. In this case, since you will bring the token with every request, it will be a big burden on the request.

2. Not very safe

Many articles on the Internet say that tokens are more secure, but this is not the case. If you are careful, you may have discovered that we say tokens are stored in the browser. Let’s ask more carefully, where are they stored in the browser? Since it is too long and placed in a cookie, it may cause the cookie to exceed the limit, so it has to be placed in local storage. This will cause security risks, because local storage such as local storage can be directly read by JS. In addition, as mentioned above It was also mentioned that once the token is generated, it cannot be invalidated until it expires. In this case, if the server detects a security threat, the related token cannot be invalidated.

So token is more suitable for one-time command authentication and sets a shorter validity period

Misunderstanding: Cookies are more insecure than tokens, such as CSRF attacks

First we need to explain what a CSRF attack is.

The attacker uses some technical means to deceive the user’s browser to visit a website that he has authenticated and perform some operations (such as sending emails, sending messages, and even property operations such as transferring money and purchasing goods). Since the browser has been authenticated (the cookie contains identity authentication information such as sessionId), the visited website will consider it to be a real user operation and run it.

For example, the user logs in to a bank website (assumed to be http://www.examplebank.com/, and the transfer address is http://www.examplebank.com/withdraw?amount=1000 & transferTo=PayeeName), after logging in, the cookie will contain the sessionid of the logged in user. The attacker can place the following code on another website

<img src="http://www.examplebank.com/withdraw?account=Alice & amp;amount=1000 & amp;for=Badman">

So if a normal user clicks on the above picture by mistake, since the request for the same domain name will automatically bring a cookie, and the cookie contains the sessionid of the normal logged-in user, a transfer operation like the above will be successful on the server, which will cause great trouble. Security Risk

The root cause of CSRF attacks is that for every request for the same domain name, its cookie will be automatically brought along. This is determined by the browser mechanism, so many people believe that cookies are not safe.

Using tokens does avoid the problem of CSRF, but as mentioned above, since the token is stored in local storage, it will be read by JS, which is also unsafe from a storage perspective (actually protecting against CSRF The correct way to attack is to use CSRF token)

So whether it is a cookie or a token, it is actually not safe from a storage perspective, and there is a risk of exposure. The security we talk about emphasizes more on the security during transmission. It can be transmitted using the HTTPS protocol. In this case, the request header Being able to be encrypted ensures security during transmission.

In fact, it is unreasonable for us to compare cookies and tokens. One is the storage method and the other is the verification method. The correct comparison should be session vs token.

Summary

There is essentially no difference between session and token. They are both authentication mechanisms for user identities, but the verification mechanisms they implement are different (one is saved in the server, and is verified through obtaining it from middleware such as redis, and the other is saved in the client. , verified by signature verification), it is more reasonable to use session in most scenarios, but if it is used for single sign-on and one-time command authentication, it is more appropriate to use token. It is best to choose reasonably in different business scenarios. , in order to achieve twice the result with half the effort.

From: Just call me a random name – Zhihu