18 | Accessible in all directions: HTTP redirects and jumps

In the first lecture of the column, I said that in order to realize the idea of building a hyperlinked document system on the Internet, Tim Berners-Lee invented the World Wide Web, using the HTTP protocol to transmit “hypertext”, allowing people all over the world to Ability to share information freely.

“Hypertext” contains “hyperlinks” that can jump from one “hypertext” to another “hypertext”, which is a fundamental change to the traditional linear structure of documents.

The ability to jump anywhere on the web using “hyperlinks” is also a key feature of the World Wide Web. It connects documents scattered around the world to form a complex network structure. Users can click on links and switch pages at will while viewing. In addition, the browser provides auxiliary functions such as “forward”, “backward” and “bookmarks”, making it more convenient for users to jump between documents and giving them more initiative and interactivity.

So, what is the jump when clicking the “link” on the page? To be more specific, for example, what happens when you click the “download” link on the Nginx homepage?

Combined with the previous lessons, you can get the answer after a little thought: the browser must first parse the URI in the link text.

http://nginx.org/en/download.html

Then use this URI to initiate a new HTTP request. After obtaining the response message, the display content will be switched and the page pointed to by the new URI will be rendered.

Such a jump action is initiated by the browser user and can be called “Active jump“. However, there is another type of jump that is initiated by the server and the browser user cannot Control can be called “Passive jump“, which has a special term in the HTTP protocol called “Redirection” (Redirection).

The process of redirection

In fact, we have seen redirection before. As mentioned about the 3×× status code in Lecture 12, 301 is “permanent redirection” and 302 is “temporary redirection”. The browser receives these two statuses. The code will jump to the new URI.

So, how do they do it? Is it possible to jump to the page using just these two codes?

Let’s first take a look at the redirection process in the experimental environment. Use Chrome to access the URI “/18-1”, and it will use 302 to immediately jump to “/index.html”.

As can be seen from this experiment, this “redirect” actually sent two HTTP requests. The first request returned 302, and then the second request was redirected to “/index.html”. But if you don’t use developer tools, you won’t be able to see this jump process at all. In other words, the redirection is “unconscious by the user”.

Let’s take a look at the response message returned by the first request:

A new header field “Location: /index.html” appears here, which is the secret of 301/302 redirect jumps.

The “Location” field is a response field and must appear in the response message. But it only makes sense when combined with the 301/302 status code, which marks the URI that the server requires redirection. Here it requires the browser to jump to “index.html”.

When the browser receives the 301/302 message, it will check whether there is “Location” in the response header. If so, extract the URI from the field value and issue a new HTTP request, which is equivalent to automatically clicking the link for us.

The URI in “Location” can use either an absolute URI or a relative URI. The so-called “absolute URI” is the complete form of URI, including scheme, host:port, path, etc. The so-called “relative URI” means that scheme and host:port are omitted, and only the path and query parts are incomplete, but can be calculated from the request context.

For example, the “Location: /index.html” in the experimental example just used a relative URI. It does not specify the protocol and host used to access the URI, but because it is a response message returned by the “http://www.chrono.com/18-1” redirect, the browser can spell out the complete URI:

http://www.chrono.com/index.html

The URI “/18-1” of the experimental environment also supports the use of the query parameter “dst=xxx” to indicate the redirected URI. You can use this form to try redirecting a few more times to see how the browser works.

http://www.chrono.com/18-1?dst=/15-1?name=a.json
http://www.chrono.com/18-1?dst=/17-1

Note that you can safely use relative URIs when redirecting within the site. But if you want to jump outside the site, you must use an absolute URI.

For example, if you want to jump to the Nginx official website, you must write “http://” before “nginx.org”, otherwise the browser will understand it according to the relative URI and get a URI that does not exist. “http://www.chrono.com/nginx.org”

http://www.chrono.com/18-1?dst=nginx.org #Error
http://www.chrono.com/18-1?dst=http://nginx.org #Correct

So, what happens if there is no Location field when 301/302 jumps?

You can also try this yourself, using the URI “/12-1” in Lecture 12 and the query parameter “code=302”:

http://www.chrono.com/12-1?code=302

Redirect status code

We have basically finished the redirection process just now. Now let’s talk about the status codes used in redirection.

The most common redirect status codes are 301 and 302, and there are several less common ones, such as 303, 307, 308, etc. Their final effect is similar, allowing the browser to jump to the new URI, but there are some subtle differences in semantics, so pay special attention when using them.

301 Commonly known as “Moved Permanently”, it means that the original URI no longer exists “permanently” and all future requests must use the new URI.

When the browser sees 301, it knows that the original URI is “outdated” and will make appropriate optimizations. For example, history records and updated bookmarks may be accessed directly using a new URI next time, saving the cost of jumping again. When the search engine crawler sees the 301, it will also update the index database and no longer use the old URI.

302 Commonly known as “Moved Temporarily”, it means that the original URI is in a “temporary maintenance” state, and the new URI is a “temporary worker” that plays the role of “top package”.

When the browser or crawler sees 302, it will think that the original URI is still valid, but temporarily unavailable, so it will only execute a simple jump page, without recording the new URI, and there will be no other redundant actions. The next visit will still be Use the original URI.

301/302 are the most commonly used redirect status codes, and the remaining ones in 3×× are:

303 See Other: Similar to 302, but requires the redirected request to be changed to the GET method to access a result page to avoid repeated POST/PUT operations;

307 Temporary Redirect: Similar to 302, but the methods and entities in the request are not allowed to change after redirection, and the meaning is clearer than 302;

308 Permanent Redirect: Similar to 307, request changes after redirection are not allowed, but it is the meaning of 301 “Permanent Redirect”.

However, the acceptance of these three status codes is low, and some browsers and servers may not support them. You should be cautious when developing and test to confirm the actual effect of the browser before using them.

Application scenarios of redirection

After understanding the working principle of redirection and the meaning of status code, we can have the initiative on the server side and control the behavior of the browser, but how to use redirection?

The core of using redirect jumps is to understand the two keywords “Redirect” and “Permanent/Temporary“.

Let’s first look at when redirection is needed.

One of the most common reasons is “The resource is unavailable” and needs to be replaced with a new URI.

There are many reasons for unavailability. For example, domain name changes, server changes, website revisions, and system maintenance will all cause the resources pointed by the original URI to be inaccessible. In order to avoid 404, you need to use redirection to jump to the new URI and continue to provide services to netizens.

Another reason is to “Avoid duplication“, allowing multiple URLs to jump to one URI, increasing access points without adding additional workload.

For example, some websites will apply for multiple domain names with similar names and then redirect them to the main website. For example, you can visit “qq.com”, “github.com”, “bing.com” (remember to clear the cache beforehand) and see how it is redirected.

After deciding to implement redirection, the next thing to consider is the issue of “permanent” and “temporary”, that is, whether to choose 301 or 302.

301 means “permanent“.

If the domain name, server, and website structure have undergone substantial changes, such as a new domain name being enabled, the server being switched to a new computer room, or the website directory hierarchy being restructured, these are considered “permanent” changes. The original URI is no longer available, and a 301 “permanent redirect” must be used to notify the browser and search engine to update to the new address. This is also one of the factors to be considered in search engine optimization (SEO).

302 means “temporary“.

The original URI will return to normal at some point in the future. A common application scenario is system maintenance, redirecting the website to a notification page to tell the user to visit again later. Another usage is “service downgrade”. For example, during the Double Eleven promotion, unimportant functional entrances such as order inquiry and points collection are temporarily closed to ensure that core services can run normally.

Redirect-related issues

Redirects have many uses. If you master redirects, you can gain more flexibility when setting up a website. However, you need to pay attention to two issues when using them.

The first issue is “Performance loss“. Obviously, the redirection mechanism determines that a jump will have two requests and responses, one more than a normal access.

Although 301/302 messages are small, the impact of a large number of jumps on the server cannot be ignored. Fortunately, intra-site redirection can be reused for long connections, but off-site redirection requires opening two connections. If the network connection quality is poor, the cost will be much higher, which will seriously affect the user experience.

So redirection should be used in moderation and never abused.

The second problem is “Loop jump“. If the redirection policy settings are not taken into account, an infinite loop of “A=>B=>C=>A” may occur, constantly spinning around in this link, and the consequences can be imagined.

Therefore, the HTTP protocol specifically stipulates that the browser must have the ability to detect “loop jumps”. When this situation is discovered, it should stop sending requests and give an error prompt.

The URI “/18-2” of the experimental environment simulates such a “loop jump”. It jumps to “/18-1” and then jumps back to itself using the parameter “dst=/18-2”. This is achieved. Infinite loop of two URIs.

Using Chrome to access this address will result in “This page is not functioning properly”:

Summary

Today we learned about redirection and jump in HTTP. Let’s briefly summarize the content this time:

Redirection is a jump initiated by the server, requiring the client to resend the request using a new URI. This is usually done automatically and the user is unaware;

301/302 are the most commonly used redirect status codes, which are “permanent redirect” and “temporary redirect” respectively;

The response header field Location indicates the URI to be redirected, which can be in absolute or relative form;

Redirection can point one URI to another URI, or multiple URIs to the same URI. It has many uses;

When using redirection, you need to be careful about performance losses and avoid loop jumps.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Network skill treeProtocol supporting applicationsHTTP protocol 42114 people are learning the system