A brief analysis of network protocols-HTTP protocol

1. Introduction to HTTP

The HTTP protocol is the abbreviation of Hyper Text Transfer Protocol, which is a transfer protocol used to transfer hypertext from the World Wide Web (WWW: World Wide Web) server to the local browser.

HTTP is a communication protocol based on TCP/IP to transfer data (HTML files, image files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer. Due to its simple and fast method, it is suitable for distributed hypermedia information systems. It was proposed in 1990 and has been continuously improved and expanded after several years of use and development. The sixth version of HTTP/1.0 is currently used in the WWW. The standardization work of HTTP/1.1 is in progress, and the HTTP-NG (Next Generation of HTTP) proposal has been put forward.

The HTTP protocol works on a client-server architecture. As an HTTP client, the browser sends all requests to the HTTP server, that is, the WEB server, through the URL. The web server sends response information to the client based on the received request.

http request-response model.jpg

2. Main features

1. Simple and fast: When a client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small and the communication speed is very fast.

2. Flexible: HTTP allows the transmission of any type of data object. The type being transferred is marked by Content-Type.

3. No connection: The meaning of no connection is to limit each connection to only process one request. After the server processes the client’s request and receives the client’s response, it disconnects. This method saves transmission time.

4. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory ability for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be retransmitted, which may result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it does not need previous information.
5. Support B/S and C/S modes.

3.HTTP URL

HTTP uses Uniform Resource Identifiers (URI) to transmit data and establish connections. A URL is a special type of URI that contains enough information to find a resource

URL, the full name is UniformResourceLocator, which is called Uniform Resource Locator in Chinese. It is an address used to identify a certain resource on the Internet. Take the following URL as an example to introduce the components of a common URL:

http://www.aspxfans.com:8080/news/index.asp?boardID=5 & amp;ID=24618 & amp;page=1#name

As can be seen from the above URL, a complete URL includes the following parts:
1. Protocol part: The protocol part of the URL is “http:”, which means that the web page uses the HTTP protocol. Various protocols can be used on the Internet, such as HTTP, FTP, etc. In this example, the HTTP protocol is used. The “//” after “HTTP” is the delimiter

2. Domain name part: The domain name part of the URL is “www.aspxfans.com”. In a URL, the IP address can also be used as the domain name.

3. Port part: Following the domain name is the port, and “:” is used as the separator between the domain name and the port. The port is not a required part of a URL. If the port part is omitted, the default port will be used.

4. Virtual directory part: Starting from the first “/” after the domain name to the last “/”, it is the virtual directory part. The virtual directory is also not a required part of a URL. The virtual directory in this example is “/news/”

5. File name part: Starting from the last “/” after the domain name and ending with “?”, it is the file name part. If there is no “?”, it starts from the last “/” after the domain name and ends with “#” , is the file part. If there are no “?” and “#”, then from the last “/” after the domain name to the end, it is the file name part. The file name in this example is “index.asp”. The file name part is not a required part of a URL. If this part is omitted, the default file name is used.

6. Anchor part: From the beginning to the end of “#”, it is the anchor part. The anchor part in this case is “name”. The anchor part is not a required part of a URL either

7. Parameter part: The part starting from “?” to “#” is the parameter part, also known as the search part and the query part. The parameter part in this example is “boardID=5 & ID=24618 & page=1”. Parameters can allow multiple parameters, and “&” is used as a separator between parameters.

(Original text: Detailed explanation of the composition of a URL_The composition of a link-CSDN Blog)

4. The difference between URI and URL

URI is a uniform resource identifier, which is used to uniquely identify a resource.

Every resource available on the Web such as HTML documents, images, video clips, programs, etc. is located by a URI.
URI generally consists of three parts:
①Naming mechanism for accessing resources
②The host name where the resources are stored
③The name of the resource itself is represented by the path, with emphasis on the resource.

URL is a uniform resource locator. It is a specific URI. That is, URL can be used to identify a resource and also indicates how to locate the resource.

URL is a string used to describe information resources on the Internet. It is mainly used in various WWW client programs and server programs, especially the famous Mosaic.
URLs can be used to describe various information resources in a unified format, including files, server addresses and directories, etc. URLs generally consist of three parts:
①Agreement (or service method)
②The IP address of the host where the resource is stored (sometimes including the port number)
③The specific address of the host resource. Such as directory and file names, etc.

URN, uniform resource name, unified resource naming, identifies resources by name, such as mailto:[email protected].

URI defines a unified resource identification with an abstract, high-level concept, while URL and URN are specific ways of identifying resources. Both URL and URN are a type of URI. Loosely speaking, every URL is a URI, but not necessarily every URI is a URL. This is because URIs also include a subclass, the Uniform Resource Name (URN), which names the resource but does not specify how to locate the resource. The mailto, news, and isbn URIs above are all examples of URNs.

In Java’s URI, a URI instance can represent absolute or relative, as long as it conforms to the syntax rules of URI. The URL class not only conforms to the semantics, but also contains information to locate the resource, so it cannot be relative.
In the Java class library, the URI class does not contain any methods for accessing resources. Its only function is to parse.
In contrast, the URL class opens a stream to a resource.

5. HTTP request message Request

The client sends an HTTP request to the server. The request message includes the following format:

It consists of four parts: request line, request header, blank line and request data.

Http request message structure.png

The request line begins with a method symbol, separated by spaces, followed by the requested URI and protocol version.

Get request example, use the request captured by Charles:

GET /562f25980001b1b106000338.jpg HTTP/1.1
Host img.mukewang.com
User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36
Accept image/webp,image/*,*/*;q=0.8
Referer http://www.imooc.com/
Accept-Encoding gzip, deflate, sdch
Accept-Language zh-CN,zh;q=0.8

Part 1: Request line, used to describe the request type, the resources to be accessed and the HTTP version used.

GET indicates that the request type is GET, [/562f25980001b1b106000338.jpg] is the resource to be accessed, and the last part of the line indicates that the HTTP1.1 version is used.

The second part: the request header, the part immediately after the request line (i.e. the first line), is used to explain the additional information to be used by the server

Starting from the second line of the request header, HOST will indicate the destination of the request. User-Agent, server-side and client-side scripts can access it. It is an important basis for browser type detection logic. This information is determined by your browser The server is defined and automatically sent in each request, etc.

Part 3: Blank line, the blank line after the request header is required

Even if the request data for the fourth part is empty, there must be an empty row.

Part 4: The request data is also called the body, and any other data can be added.

The request data for this example is empty.

POST request example, using the request captured by Charles:

POST/HTTP1.1
Host:www.wrox.com
User-Agent:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)
Content-Type:application/x-www-form-urlencoded
Content-Length:40
Connection: Keep-Alive

name=Professional Ajax & amp;publisher=Wiley

The first part: the request line. The first line is clearly a post request and the http1.1 version.
Part 2: Request header, lines 2 to 6.
Part 3: Blank line, blank line on the seventh line.
Part 4: Request data, line 8.

6. HTTP response message Response

Under normal circumstances, the server will return an HTTP response message after receiving and processing the request from the client.

HTTP response also consists of four parts: status line, message header, blank line and response body.

http response message format.jpg

Examples

HTTP/1.1 200 OK
Date: Fri, 22 May 2009 06:07:21 GMT
Content-Type: text/html; charset=UTF-8

<html>
      <head></head>
      <body>
            <!--body goes here-->
      </body>
</html>

Part 1: Status line, consisting of three parts: HTTP protocol version number, status code, and status message.

The first line is the status line, (HTTP/1.1) indicating that the HTTP version is version 1.1, the status code is 200, and the status message is (ok)

Part 2: Message header, used to describe some additional information to be used by the client

The second and third lines are message headers,
Date: The date and time when the response was generated; Content-Type: HTML (text/html) specifying the MIME type, and the encoding type is UTF-8

Part 3: Blank line, the blank line after the message header is required

Part 4: Response text, the text information returned by the server to the client.

The html part after the blank line is the response body.

7.HTTP status code

The status code consists of three digits. The first digit defines the category of the response. There are five categories in total:

1xx: Instruction information–indicates that the request has been received and continues to be processed

2xx: Success–indicates that the request has been successfully received, understood, and accepted

3xx: Redirect–further operations must be performed to complete the request

4xx: Client error–the request has a syntax error or the request cannot be fulfilled

5xx: Server-side error–The server failed to implement a legal request

Common status codes:

200 OK //Client request successful
400 Bad Request //The client request has a syntax error and cannot be understood by the server.
401 Unauthorized //The request is unauthorized, this status code must be used together with the WWW-Authenticate header field
403 Forbidden //The server received the request but refused to provide the service.
404 Not Found //The requested resource does not exist, eg: the wrong URL was entered
500 Internal Server Error //An unexpected error occurred in the server
503 Server Unavailable //The server is currently unable to process the client's request and may return to normal after a period of time

More status codes HTTP status codes | Newbie tutorial

8.HTTP request method

According to the HTTP standard, HTTP requests can use multiple request methods.
HTTP1.0 defines three request methods: GET, POST and HEAD methods.
HTTP1.1 adds five new request methods: OPTIONS, PUT, DELETE, TRACE and CONNECT methods.

GET requests the specified page information and returns the entity body.
HEAD is similar to a get request, except that there is no specific content in the returned response and is used to obtain headers.
POST submits data to the specified resource for processing the request (such as submitting a form or uploading a file). The data is included in the request body. POST requests may result in the creation of new resources and/or modification of existing resources.
PUT transfers data from the client to the server to replace the contents of the specified document.
DELETE requests the server to delete the specified page.
The CONNECT HTTP/1.1 protocol is reserved for proxy servers that can change connections to pipelines.
OPTIONS allows clients to view server performance.
TRACE echoes requests received by the server, mainly used for testing or diagnostics.

9. How HTTP works

The HTTP protocol defines how a Web client requests a Web page from a Web server, and how the server delivers the Web page to the client. The HTTP protocol uses a request/response model. The client sends a request message to the server. The request message contains the requested request header method, URL, protocol version, and request data. The server responds with a status line that includes the protocol version, success or error code, server information, response headers, and response data.

Following are the steps for HTTP request/response:

1. The client connects to the Web server

An HTTP client, usually a browser, establishes a TCP socket connection with the Web server’s HTTP port (default is 80). For example, http://www.oakcms.cn.

2. Send HTTP request

Through the TCP socket, the client sends a text request message to the Web server. A request message consists of four parts: request line, request header, blank line and request data.

3. The server accepts the request and returns an HTTP response

The web server parses the request and locates the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of four parts: status line, response header, blank line and response data.

4. Release the TCP connection

If the connection mode is close, the server actively closes the TCP connection, and the client passively closes the connection and releases the TCP connection; if the connection mode is keepalive, the connection will be maintained for a period of time, during which time it can continue to receive requests;

5. Client browser parses HTML content

The client browser first parses the status line for a status code indicating whether the request was successful. Then each response header is parsed, and the response header tells the following HTML document of several bytes and the character set of the document. The client browser reads the response data HTML, formats it according to the syntax of HTML, and displays it in the browser window.

For example: type the URL in the browser address bar and press Enter, you will go through the following process:

1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL;

2. After parsing the IP address, establish a TCP connection with the server based on the IP address and the default port 80;

3. The browser issues an HTTP request to read the file (the file corresponding to the part after the domain name in the URL), and the request message is sent to the server as the data of the third message of the TCP three-way handshake;

4. The server responds to the browser request and sends the corresponding html text to the browser;

5. Release the TCP connection;

6. The browser converts the html text and displays the content;

10.The difference between GET and POST requests

GET request

GET /books/?sex=man &name=Professional HTTP/1.1
Host: www.wrox.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)
Gecko/20050225 Firefox/1.0.1
Connection: Keep-Alive

Note that the last line is a blank line

POST request

POST/HTTP/1.1
Host: www.wrox.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)
Gecko/20050225 Firefox/1.0.1
Content-Type: application/x-www-form-urlencoded
Content-Length: 40
Connection: Keep-Alive

name=Professional Ajax & amp;publisher=Wiley

1. GET submission, the requested data will be appended to the URL (that is, the data is placed in the HTTP protocol header) to split the URL and transmit the data. Multiple parameters are connected with & amp; for example: login.action?name= hyddd & amp;password=idontknow & amp;verify=Hello. If the data is English letters/numbers, send it as it is. If it is a space, convert it to +. If it is Chinese/other characters, directly encrypt the string with BASE64, and the result is as follows: Hello, where the XX in %XX is The symbol is ASCII expressed in hexadecimal.

POST submission: Place the submitted data in the body of the HTTP package. In the above example, the red font indicates the actual transmission data.

Therefore, the data submitted by GET will be displayed in the address bar, but when submitted by POST, the address bar will not change

2. The size of the transmitted data: First of all, it is stated that the HTTP protocol does not limit the size of the transmitted data, and the HTTP protocol specification does not limit the length of the URL.

The limitations that exist in actual development mainly include:

GET: Certain browsers and servers have limitations on URL length. For example, IE’s limit on URL length is 2083 bytes (2K + 35). For other browsers, such as Netscape, FireFox, etc., there is theoretically no length limit, and the limit depends on the support of the operating system.

Therefore, when submitting GET, the transmitted data will be limited by the length of the URL.

POST: Since the value is not passed through the URL, the data is theoretically unlimited. However, each WEB server actually stipulates limits on the size of post submission data. Apache and IIS6 have their own configurations.

3. Security

POST is more secure than GET. For example: when submitting data through GET, the username and password will appear in clear text on the URL, because (1) the login page may be cached by the browser; (2) other people view the history of the browser, then others can get your account and password. In addition, using GET to submit data may also cause Cross-site request forgery attacks.

4. Http get, post, soap protocols all run on http

(1) get: The request parameter is appended to the URL as a sequence of key/value pairs (query string)
The length of the query string is limited by web browsers and web servers (e.g. IE supports up to 2048 characters), which is not suitable for transmitting large data sets. At the same time, it is very unsafe.

(2) Post: The request parameters are transmitted in a different part of the http header (named entity body). This part is used to transmit form information, so Content-type must be set to: application/x-www-form-urlencoded . Post is designed to support user fields on web forms, and its parameters are also transmitted as key/value pairs.
But: it does not support complex data types because POST does not define the semantics and rules for transferring data structures.

(3) soap: It is a special version of http post, following a special xml message format
Content-type is set to: text/xml Any data can be xmlized.

The HTTP protocol defines many methods of interacting with the server, the most basic of which are GET, POST, PUT, and DELETE. A URL address is used to describe a resource on the network, and GET, POST, PUT, DELETE corresponds to the four operations of checking, modifying, adding, and deleting this resource. Our most common ones are GET and POST. GET is generally used to obtain/query resource information, while POST is generally used to update resource information.

Let’s look at the difference between GET and POST

1. The data submitted by GET will be placed after the URL, with ? to split the URL and transfer data, and the parameters are connected with &, such as EditPosts.aspx?name=test1 & id=123456. The POST method puts the submitted data In the Body of the HTTP package.
2. There is a limit on the size of the data submitted by GET (because the browser has a limit on the length of the URL), while there is no limit on the data submitted by the POST method.
3. The GET method requires using Request.QueryString to obtain the value of the variable, while the POST method uses Request.Form to obtain the value of the variable.
4. Submitting data through GET will bring security issues, such as a login page. When submitting data through GET, the username and password will appear on the URL. If the page can be cached or other people can access this machine, you can retrieve it from the history. Record the user’s account number and password.