Before starting to read this chapter, you need to know the following knowledge in advance
- Reading this section requires some basic knowledge of docker. It is best to install the docker environment on Linux.
- Master the basic knowledge of iptables in advance, refer to the previous article [iptables in practice]
1. Docker network model
The docker network model is shown in the figure below
illustrate:
- In the picture above, there are two containers, container1 and container2. Each of the two containers has a network card.
- The two containers communicate through the docker0 bridge. They are on the same LAN, and their IP addresses are 172.17.0.2 and 172.17.0.3 respectively.
- What is the docker0 bridge? It is actually a switch. Network packets are communicated between containers through the layer 2 network.
In Linux, the network device that can function as a virtual switch is a bridge. It is a device that works at the data link layer (Data Link). Its main function is to forward data packets to different ports of the bridge based on MAC address learning.
2. Container network interoperability experiment
We install a kafka message middleware through docker. The kafka middleware requires the support of zookeeper. So we install two container applications, zookeeper and kafka, on a virtual machine. Zookeeper provides services for kafka.
Install kafka in three minutes
See the link above for the installation process
2.1 Local network viewing
After installing it as above, we don’t start the container first (you can stop the container through the docker stop command first), and directly look at the network information on the Linux host machine.
[root@localhost ~]# ifconfig docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 inet6 fe80::42:6ff:fe21:5ecb prefixlen 64 scopeid 0x20<link> ether 02:42:06:21:5e:cb txqueuelen 0 (Ethernet) RX packets 68 bytes 3888 (3.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 112 bytes 8883 (8.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255 inet6 fe80::a00:27ff:fe1d:60a9 prefixlen 64 scopeid 0x20<link> ether 08:00:27:1d:60:a9 txqueuelen 1000 (Ethernet) RX packets 114 bytes 16795 (16.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 172 bytes 16485 (16.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.56.201 netmask 255.255.255.0 broadcast 192.168.56.255 inet6 fe80::db6e:9a5d:7349:6075 prefixlen 64 scopeid 0x20<link> ether 08:00:27:c3:0a:37 txqueuelen 1000 (Ethernet) RX packets 401 bytes 32801 (32.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 294 bytes 34565 (33.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The above code shows how many network devices there are
- docker0: the container’s bridge
- enp0s3 and enp0s8: These two are actually two network cards of the physical machine
- lo: localhost, that is, the local machine
2.2 Start two container applications zookeeper and kafka
[root@localhost ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0d5cb60e3a06 bitnami/rabbitmq "/opt/bitnami/script…" 13 days ago Exited (0) 4 minutes ago rabbitmq 43a5066a11f5 bitnami/zookeeper "/opt/bitnami/script…" 13 days ago Exited (143) 11 days ago zookeeper 922e61e655f6 bitnami/kafka:latest "/opt/bitnami/script…" 2 weeks ago Exited (137) 23 minutes ago kafka 2290b7d3a4ff nginx:latest "/docker-entrypoint.…" 2 months ago Exited (0) 2 months ago mynginx
As shown above, the containers I have run, we run zookeeper and kafka
[root@localhost ~]# docker start zookeeper zookeeper [root@localhost ~]# docker start kafka kafka
Start two container applications
2.3 Take another look at the local network
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 inet6 fe80::42:6ff:fe21:5ecb prefixlen 64 scopeid 0x20<link> ether 02:42:06:21:5e:cb txqueuelen 0 (Ethernet) RX packets 336 bytes 43788 (42.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 323 bytes 48881 (47.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255 inet6 fe80::a00:27ff:fe1d:60a9 prefixlen 64 scopeid 0x20<link> ether 08:00:27:1d:60:a9 txqueuelen 1000 (Ethernet) RX packets 134 bytes 18385 (17.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 196 bytes 18435 (18.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.56.201 netmask 255.255.255.0 broadcast 192.168.56.255 inet6 fe80::db6e:9a5d:7349:6075 prefixlen 64 scopeid 0x20<link> ether 08:00:27:c3:0a:37 txqueuelen 1000 (Ethernet) RX packets 565 bytes 45134 (44.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 394 bytes 45995 (44.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 veth164e95d: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::1441:abff:feb2:fc36 prefixlen 64 scopeid 0x20<link> ether 16:41:ab:b2:fc:36 txqueuelen 0 (Ethernet) RX packets 99 bytes 21233 (20.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 124 bytes 16191 (15.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vethda42807: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::183c:e8ff:feae:1af2 prefixlen 64 scopeid 0x20<link> ether 1a:3c:e8:ae:1a:f2 txqueuelen 0 (Ethernet) RX packets 169 bytes 22419 (21.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 122 bytes 28133 (27.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:ae:75:56 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Two more network devices, veth164e95d and vethda42807, were found. These two devices
My virtual machine is centos8. You can check the network equipment status through bridge link (you can use the brctl show command to check centos7). It was found that the network devices veth164e95d and vethda42807 are connected to the docker0 bridge.
[root@localhost ~]# bridge link 18: veth164e95d@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 state forwarding priority 32 cost 2 20: vethda42807@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 state forwarding priority 32 cost 2
The Docker project will create a bridge named docker0 on the host machine by default. Any container connected to the docker0 bridge can communicate through it.
But, how do we “connect” these containers to the docker0 bridge?
At this time, we need to use a virtual device called Veth Pair.
The characteristic of the Veth Pair device is that after it is created, it always appears in the form of two virtual network cards (Veth Peer). Moreover, the data packet sent from one of the “network cards” can directly appear on the other “network card” corresponding to it, even if the two “network cards” are in different Network Namespaces
The other ends of the two devices veth164e95d and vethda42807 in the host are connected to the network cards in the container. As long as the network card in the container sends a message, it will appear on veth164e95d and vethda42807 respectively.
2.4 Container interconnection network analysis
First take a look at the running status of the container
We mapped the kafka container port 9092 to the host’s port 9092. The kafka client can connect to the kafka middleware through 9092
[root@localhost ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 43a5066a11f5 bitnami/zookeeper "/opt/bitnami/script…" 2 weeks ago Up 6 minutes 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 8080/tcp zookeeper 922e61e655f6 bitnami/kafka:latest "/opt/bitnami/script…" 2 weeks ago Up 5 minutes 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp kafka
Take another look at the network conditions of kafka and zookeeper
[root@localhost ~]# docker inspect kafka ....omitted.... "Networks": {<!-- --> "bridge": {<!-- --> "IPAMConfig": null, "Links": null, "Aliases": null, "NetworkID": "6b81b63148c199d79c62758e548a80732b9401231ccd741783c220077a1d7a93", "EndpointID": "9824ca7180c438118e70be86d055b02c74f7ea82225db7c9be264e43ee5e6d32", "Gateway": "172.17.0.1", "IPAddress": "172.17.0.3", "IPPrefixLen": 16, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:ac:11:00:03", "DriverOpts": null } }
You can see that kafka’s IP is 172.17.0.3 and the gateway is 172.17.0.1
Take another look at zookeeper
[root@localhost ~]# docker inspect zookeeper ....omitted.... "Networks": {<!-- --> "bridge": {<!-- --> "IPAMConfig": null, "Links": null, "Aliases": null, "NetworkID": "6b81b63148c199d79c62758e548a80732b9401231ccd741783c220077a1d7a93", "EndpointID": "0b057f5d03cfd775de26a2de03d707e6b5b84fd0321b2d298a5399516cb75acc", "Gateway": "172.17.0.1", "IPAddress": "172.17.0.2", "IPPrefixLen": 16, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:ac:11:00:02", "DriverOpts": null } }
The IP of zookeeper is 172.17.0.2 and the gateway is 172.17.0.1
Now, look at this picture again. Is it clearer?
Conclusion 1: Different containers on the same host can communicate through the docker0 bridge
3. How the host accesses the container
Through the above analysis, containers can communicate through the docker0 bridge. So how does the host access the container?
[root@localhost ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3 0.0.0.0 192.168.56.100 0.0.0.0 UG 101 0 0 enp0s8 10.0.2.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.56.0 0.0.0.0 255.255.255.0 U 101 0 0 enp0s8 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
Through the route -n command, you can view the routing rules of the host. One of them, the 172.17.0.0 network segment, will send the packet through docker0.
Let’s try to ping 172.17.0.2, open a new window, and capture the packet through tcpdump.
[root@localhost ~]# ping 172.17.0.2 PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data. 64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.176 ms 64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.120 ms 64 bytes from 172.17.0.2: icmp_seq=3 ttl=64 time=0.134 ms
It can be seen that through the docker0 bridge on the host, network packets can go directly to the inside of the container.
[root@localhost ~]# tcpdump -i docker0 -nn icmp dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes 00:54:22.019423 IP 172.17.0.1 > 172.17.0.2: ICMP echo request, id 9341, seq 1, length 64 00:54:22.019492 IP 172.17.0.2 > 172.17.0.1: ICMP echo reply, id 9341, seq 1, length 64 00:54:23.033807 IP 172.17.0.1 > 172.17.0.2: ICMP echo request, id 9341, seq 2, length 64
Conclusion 2: The host can access the container through the 172.17.0.0 network segment, and this network segment has a routing rule that sends the packets from this network segment to the docker0 bridge to enter the container
4. How the inside of the container communicates with the external network
For the convenience of demonstration, this time we start an nginx container
[root@localhost ~]# docker run -d -p 8080:80 --name mynginx nginx:latest [root@localhost ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2290b7d3a4ff nginx:latest "/docker-entrypoint.…" 2 months ago Up 6 seconds 0.0.0.0:8080->80/tcp, :::8080->80/tcp mynginx
Port 80 inside the container is mapped to port 8080 on the host. It can be accessed successfully through the host’s IP, as shown in the figure below
How do network packets reach the container from the outside? Let’s make a bold guess. When the network packet reaches the machine, it should undergo destination address translation, rewrite the destination address of the network packet accessing the host, and then pass through the docker0 bridge, so that it can access the inside of the container.
Since it is network address translation, it is nat. Let’s check the iptables nat rules.
[root@localhost ~]# iptables -t nat -nvL Chain PREROUTING (policy ACCEPT 211 packets, 19122 bytes) pkts bytes target prot opt in out source destination 84 5992 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain INPUT (policy ACCEPT 74 packets, 4424 bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 691 packets, 54705 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0 669 52735 LIBVIRT_PRT all -- * * 0.0.0.0/0 0.0.0.0/0 0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80 Chain OUTPUT (policy ACCEPT 688 packets, 54549 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain LIBVIRT_PRT (1 references) pkts bytes target prot opt in out source destination 10 695 RETURN all -- * * 192.168.122.0/24 224.0.0.0/24 0 0 RETURN all -- * * 192.168.122.0/24 255.255.255.255 0 0 MASQUERADE tcp -- * * 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 0 0 MASQUERADE udp -- * * 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 0 0 MASQUERADE all -- * * 192.168.122.0/24 !192.168.122.0/24 CHAIN DOCKER (2 references) pkts bytes target prot opt in out source destination 72 4320 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0 3 156 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80
iptables rule analysis
Incoming traffic analysis
- The PREROUTING chain references a custom chain DOCKER
- Let’s take a look at the DOCKER custom chain. There is a DNAT rule, which is destination address translation. For packets coming from a non-docker0 network card and the port is 8080, then the destination address is rewritten to 172.17.0.2:80.
- From our [Conclusion 2: The host can access the container through the 172.17.0.0 network segment, and this network segment has a routing rule, the packets from this network segment are sent to the docker0 bridge and then enter the container] we can conclude that, External traffic can now enter the container
Conclusion 3: The inside and outside of the container communicate with each other. When external traffic accesses the IP and port of the host, the PREROUTING chain will perform source address translation, so that it can enter the inside of the container
Outgoing traffic analysis
- Outgoing traffic must go through snat source address conversion and be converted into the host’s address.
- You can see the dynamic snat below, namely MASQUERADE
Chain POSTROUTING (policy ACCEPT 691 packets, 54705 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0 669 52735 LIBVIRT_PRT all -- * * 0.0.0.0/0 0.0.0.0/0 0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80
Look at the first rule. For packets sent out from 172.17.0.0 and not from docker0, source address translation is performed. The source address of the packets sent out in this way is the IP and port of the host, not the address of the 172.17.0.0 network segment of the container.
Conclusion 4: When the traffic inside the container goes out, the source address snat will be done in the POSTROUTING chain. In this way, the return message received by the client when accessing nginx will be deceived into thinking that it is sent by the host strong>