03丨Sockets and addresses: Understand them like phones and phone numbers

In network programming, we often mention the word socket. Its Chinese translation is socket, and sometimes it is also called a socket.

The original meaning of the English word socket is “socket” and “slot”. In network programming, its meaning is that the network connection and data sending and receiving can be quickly completed through socket access. You can think of it as a power socket in the real world, or a network slot needed for early Internet access, so socket can also be seen as a direct mapping of the physical world.

In fact, computer programming is a subject closely related to English. It is easier for everyone to accept many proper nouns using their original English words than translating them into Chinese. For convenience, we generally use English directly in the column. If translation is needed, we will always use the translation “socket”.

What exactly is a socket?

In network programming, how should we understand socket? Here is a picture for you to take a look at first.

This picture actually expresses the core logic of client and server work in network programming.

Let’s start with the server on the right, because the server must be initialized before the client initiates a connection request. The figure on the right shows the server-side initialization process. First, the socket is initialized. Then the server-side needs to execute the bind function to bind its service capabilities to a well-known address and port. Then, the server-side executes the listen operation. The original socket is converted into a server-side socket, and the server finally blocks on accept and waits for the client request to arrive.

At this point, the server is ready. The client needs to initialize the socket first, and then execute connect to initiate a connection request to the server’s address and port. The address and port here must be known to the client in advance. This process is the famous TCP three-way handshake. In the next article, I will talk about the principle of TCP three-way handshake in detail.

Once the three-way handshake is completed, the client and server establish a connection and enter the data transmission process.

Specifically, the client process initiates a write byte stream write operation to the operating system kernel. The kernel protocol stack transmits the byte stream to the server through the network device. The server obtains information from the kernel and reads the byte stream from the kernel to the server. In the process, the business logic processing starts. After completion, the server writes the results to the client in the same way. It can be seen thatonce the connection is established, the transmission of data is no longer one-way, but bi-directional, which is also a significant feature of TCP.

When the client completes the interaction with the server, such as performing a Telnet operation or an HTTP request, and needs to disconnect from the server, the close function will be executed. At this time, the operating system kernel will send a message to the server through the original connection link. The client sends a FIN packet, and after receiving it, the server performs a passive shutdown. At this time, the entire link is in a semi-closed state. After that, the server will also execute the close function, and the entire link will be truly closed. In the semi-closed state, the party that initiates the close request thinks that the connection is normal before receiving the other party’s FIN packet; while in the fully-closed state, both parties perceive that the connection has been closed.

Please remember the picture at the beginning of the article, it is one of the core pictures throughout the entire column.

The real purpose of this picture is to introduce the concept of socket. Please note that all the above operations are completed through socket. Whether it is the client’s connect, the server’s accept, or read/write operations,socket is the only way we use to establish a connection and transmit data.

Better understanding of sockets: a more intuitive explanation

You can imagine the entire TCP network interaction and data transmission as making a phone call. Following this idea, socket is like the telephone in our hands, connect is like dialing with the telephone, and bind on the server side is like going to A telecommunications company opens an account and binds the phone number to our home phone so that others can use this number to find you. listen is like people hearing a ring at home, and accept is like the called party picking up the phone and starting to answer. . At this point, the three-way handshake is completed and the connection is established.

Next, the person making the call starts to speak: “Hello.” At this time, the write process is entered. The process of the person receiving the call can be imagined as read (hearing and reading out the data), and starts to respond, and both parties enter Understand the read/write data transfer process.

Finally, the person who made the call completed the communication and hung up the phone. The corresponding operation can be understood as close. The person who answered the call knew that the other party had hung up and hung up the phone, which is also a close.

In the entire telephone communication process, the phone is a device through which we can communicate with the outside world. Corresponding to the world of network programming, sockets are also a way for us to communicate with the outside world through the network.

The development history of socket

Through the above explanation and this analogy of making a phone call, you now know what a socket is, right? So how was socket first proposed? Next, it is necessary to briefly trace its history together.

Socket was proposed by researchers at the University of California, Berkeley, in the early 1980s, so it is also called Berkeley socket. Berkeley researchers envisioned using the concept of sockets to shield the differences in underlying protocol stacks. The first version of socket implementation was the TCP/IP protocol. Socket was first implemented on the BSD 4.2 Unix kernel. Soon everyone discovered that such a concept brought convenience to network programming, so more people were exposed to the concept of sockets. As an open source implementation of the Unix system, Linux has developed and implemented the TCP/IP protocol from scratch very early. With the success of socket, Windows also introduced the concept of socket. So in today’s world, socket has become the standard for network interconnection.

Socket address format

When using sockets, we must first solve the addressing problem of both communicating parties. We need the socket address to establish a connection, just like when making a phone call, you first need to search the phone book and find the person you want to contact, then you can establish a connection and start communication. Next, we focus on the socket address format.

Universal socket address format

Let’s first look at the general address structure of the socket:

/* The POSIX.1g specification specifies that the address family is a 2-byte value. */
typedef unsigned short int sa_family_t;
/* Describe the universal socket address */
struct sockaddr{
    sa_family_t sa_family; /* Address family. 16-bit*/
    char sa_data[14]; /* Specific address value 112-bit */
  };

In this structure, the first field is the address family, which indicates the method used to interpret and save the address, such as the mobile phone format in the phone book, or the landline format. The length and meaning of these two formats All are different. There are many definitions of address families in glibc, and the following are commonly used:

AF_LOCAL: represents the local address, corresponding to the Unix socket. This situation is generally used for local socket communication. In many cases, it can also be written as AF_UNIX or AF_FILE;

AF_INET: IPv4 address used by the Internet;

AF_INET6: IPv6 address used by the Internet.

AF_ here means Address Family, but in many cases, we will also see macros represented by PF_, such as PF_INET, PF_INET6, etc. In fact, PF_ means Protocol Family, which means protocol family. We initialize the socket address with a value like AF_xxx and the socket with a value like PF_xxx. We can clearly see in the header file that the two values themselves correspond one to one.

/* Macro definitions for various address families */
#define AF_UNSPEC PF_UNSPEC
#define AF_LOCAL PF_LOCAL
#define AF_UNIX PF_UNIX
#define AF_FILE PF_FILE
#define AF_INET PF_INET
#define AF_AX25 PF_AX25
#define AF_IPX PF_IPX
#define AF_APPLETALK PF_APPLETALK
#define AF_NETROM PF_NETROM
#define AF_BRIDGE PF_BRIDGE
#define AF_ATMPVC PF_ATMPVC
#define AF_X25 PF_X25
#define AF_INET6 PF_INET6

sockaddr is a universal address structure, which means that it is applicable to multiple address families. Why such a universal address structure is defined will be discussed later.

IPv4 socket format address

Next, let’s take a look at the structure of commonly used IPv4 address families:

/* IPV4 socket address, 32bit value. */
typedef uint32_t in_addr_t;
struct in_addr
  {
    in_addr_t s_addr;
  };
  
/* Describe the socket address format of IPV4 */
struct sockaddr_in
  {
    sa_family_t sin_family; /* 16-bit */
    in_port_t sin_port; /* port number 16-bit*/
    struct in_addr sin_addr; /* Internet address. 32-bit */


    /* This is only used as a placeholder and has no actual use */
    unsigned char sin_zero[8];
  };

We interpret this structure a little, and we can first find that like sockaddr, it has a 16-bit sin_family field. For IPv4, this value is AF_INET.

Next is the port number. We can see that the port number is up to 16-bit, which means that the maximum supported is 2 to the 16th power. This number is 65536, so we should know that the maximum port number that supports addressing is 65535. Regarding ports, I also mentioned them in the previous chapter. Here I will focus on reserved ports. The so-called reserved ports are ports that are agreed upon by everyone and have been widely used by corresponding services, such as port 21 of ftp, port 22 of ssh, port 80 of http, etc. Generally speaking, ports greater than 5000 can be used as ports for our own applications.

The following are reserved ports defined by glibc.

/* Standard well-known ports. */
enum
  {
    IPPORT_ECHO = 7, /* Echo service. */
    IPPORT_DISCARD = 9, /* Discard transmissions service. */
    IPPORT_SYSTAT = 11, /* System status service. */
    IPPORT_DAYTIME = 13, /* Time of day service. */
    IPPORT_NETSTAT = 15, /* Network status service. */
    IPPORT_FTP = 21, /* File Transfer Protocol. */
    IPPORT_TELNET = 23, /* Telnet protocol. */
    IPPORT_SMTP = 25, /* Simple Mail Transfer Protocol. */
    IPPORT_TIMESERVER = 37, /* Timeserver service. */
    IPPORT_NAMESERVER = 42, /* Domain Name Service. */
    IPPORT_WHOIS = 43, /* Internet Whois service. */
    IPPORT_MTP = 57,




    IPPORT_TFTP = 69, /* Trivial File Transfer Protocol. */
    IPPORT_RJE = 77,
    IPPORT_FINGER = 79, /* Finger service. */
    IPPORT_TTYLINK = 87,
    IPPORT_SUPDUP = 95, /* SUPDUP protocol. */


    IPPORT_EXECSERVER = 512, /* execd service. */
    IPPORT_LOGINSERVER = 513, /* rlogind service. */
    IPPORT_CMDSERVER = 514,
    IPPORT_EFSSERVER = 520,


    /* UDP ports. */
    IPPORT_BIFFUDP = 512,
    IPPORT_WHOSERVER = 513,
    IPPORT_ROUTESERVER = 520,


    /* Ports less than this value are reserved for privileged processes. */
    IPPORT_RESERVED = 1024,


    /* Ports greater this value are reserved for (non-privileged) servers. */
    IPPORT_USERRESERVED = 5000

The actual IPv4 address is a 32-bit field. It can be imagined that the maximum number of supported addresses is 2 to the 32nd power, which is about 4.2 billion. It should be said that this number was still very huge at the beginning of the design. Unfortunately, the Internet is booming and the world is As more and more devices are connected, this number gradually becomes insufficient, so the well-known IPv6 makes its grand debut.

IPv6 socket address format

Let’s take a look at the address structure of IPv6:

struct sockaddr_in6
  {
    sa_family_t sin6_family; /* 16-bit */
    in_port_t sin6_port; /* Transmission port number # 16-bit */
    uint32_t sin6_flowinfo; /* IPv6 flow control information 32-bit*/
    struct in6_addr sin6_addr; /* IPv6 address 128-bit */
    uint32_t sin6_scope_id; /* IPv6 domain ID 32-bit */
  };

The length of the entire structure is 28 bytes, and the flow control information and domain ID are ignored for now. One of these two fields does not appear at all on the glibc official website, and the other is a currently unused field. The address family here should obviously be AF_INET6. The port is the same as the IPv4 address. The key address has been upgraded from 32 bits to 128 bits. This number is terrifyingly large, which completely solves the problem of insufficient addressing numbers.

Please note that the above address formats, regardless of IPv4 or IPv6, are Internet socket formats. There is also a local socket format used for communication between local processes, which is the previously mentioned AF_LOCAL.

struct sockaddr_un {
    unsigned short sun_family; /* fixed to AF_LOCAL */
    char sun_path[108]; /* path name */
};

Comparison of several socket address formats

A comparison of these types of addresses is shown in the figure below. The length of the IPv4 and IPv6 socket address structures is fixed, while the length of the local address structure is variable.

Summary

In this lecture, we focus on what a socket is and the corresponding socket address format. As the basis of network programming, the concept of sockets is extremely important. The design of sockets has opened the door to network programming for us. In fact, it is precisely because BSD sockets are so successful that major Unix manufacturers (including open source Linux) and Windows platforms will soon copy them. In the next lecture, we will begin to create and use sockets, establish connections, and further begin our network programming journey.