[iptables practical] 9 analysis of docker network principles

Before starting to read this chapter, you need to know the following knowledge in advance

  • Reading this section requires some basic knowledge of docker. It is best to install the docker environment on Linux.
  • Master the basic knowledge of iptables in advance, refer to the previous article [iptables in practice]

1. Docker network model

The docker network model is shown in the figure below

illustrate:

  • In the picture above, there are two containers, container1 and container2. Each of the two containers has a network card.
  • The two containers communicate through the docker0 bridge. They are on the same LAN, and their IP addresses are 172.17.0.2 and 172.17.0.3 respectively.
  • What is the docker0 bridge? It is actually a switch. Network packets are communicated between containers through the layer 2 network.

In Linux, the network device that can function as a virtual switch is a bridge. It is a device that works at the data link layer (Data Link). Its main function is to forward data packets to different ports of the bridge based on MAC address learning.

2. Container network interoperability experiment

We install a kafka message middleware through docker. The kafka middleware requires the support of zookeeper. So we install two container applications, zookeeper and kafka, on a virtual machine. Zookeeper provides services for kafka.
Install kafka in three minutes
See the link above for the installation process

2.1 Local network viewing

After installing it as above, we don’t start the container first (you can stop the container through the docker stop command first), and directly look at the network information on the Linux host machine.

[root@localhost ~]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
        inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
        inet6 fe80::42:6ff:fe21:5ecb prefixlen 64 scopeid 0x20<link>
        ether 02:42:06:21:5e:cb txqueuelen 0 (Ethernet)
        RX packets 68 bytes 3888 (3.7 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 112 bytes 8883 (8.6 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255
        inet6 fe80::a00:27ff:fe1d:60a9 prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:1d:60:a9 txqueuelen 1000 (Ethernet)
        RX packets 114 bytes 16795 (16.4 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 172 bytes 16485 (16.0 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.168.56.201 netmask 255.255.255.0 broadcast 192.168.56.255
        inet6 fe80::db6e:9a5d:7349:6075 prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:c3:0a:37 txqueuelen 1000 (Ethernet)
        RX packets 401 bytes 32801 (32.0 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 294 bytes 34565 (33.7 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

The above code shows how many network devices there are

  • docker0: the container’s bridge
  • enp0s3 and enp0s8: These two are actually two network cards of the physical machine
  • lo: localhost, that is, the local machine

2.2 Start two container applications zookeeper and kafka

[root@localhost ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0d5cb60e3a06 bitnami/rabbitmq "/opt/bitnami/script…" 13 days ago Exited (0) 4 minutes ago rabbitmq
43a5066a11f5 bitnami/zookeeper "/opt/bitnami/script…" 13 days ago Exited (143) 11 days ago zookeeper
922e61e655f6 bitnami/kafka:latest "/opt/bitnami/script…" 2 weeks ago Exited (137) 23 minutes ago kafka
2290b7d3a4ff nginx:latest "/docker-entrypoint.…" 2 months ago Exited (0) 2 months ago mynginx

As shown above, the containers I have run, we run zookeeper and kafka

[root@localhost ~]# docker start zookeeper
zookeeper
[root@localhost ~]# docker start kafka
kafka

Start two container applications

2.3 Take another look at the local network

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
        inet6 fe80::42:6ff:fe21:5ecb prefixlen 64 scopeid 0x20<link>
        ether 02:42:06:21:5e:cb txqueuelen 0 (Ethernet)
        RX packets 336 bytes 43788 (42.7 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 323 bytes 48881 (47.7 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255
        inet6 fe80::a00:27ff:fe1d:60a9 prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:1d:60:a9 txqueuelen 1000 (Ethernet)
        RX packets 134 bytes 18385 (17.9 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 196 bytes 18435 (18.0 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.168.56.201 netmask 255.255.255.0 broadcast 192.168.56.255
        inet6 fe80::db6e:9a5d:7349:6075 prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:c3:0a:37 txqueuelen 1000 (Ethernet)
        RX packets 565 bytes 45134 (44.0 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 394 bytes 45995 (44.9 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

veth164e95d: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet6 fe80::1441:abff:feb2:fc36 prefixlen 64 scopeid 0x20<link>
        ether 16:41:ab:b2:fc:36 txqueuelen 0 (Ethernet)
        RX packets 99 bytes 21233 (20.7 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 124 bytes 16191 (15.8 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethda42807: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet6 fe80::183c:e8ff:feae:1af2 prefixlen 64 scopeid 0x20<link>
        ether 1a:3c:e8:ae:1a:f2 txqueuelen 0 (Ethernet)
        RX packets 169 bytes 22419 (21.8 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 122 bytes 28133 (27.4 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
        inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
        ether 52:54:00:ae:75:56 txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Two more network devices, veth164e95d and vethda42807, were found. These two devices
My virtual machine is centos8. You can check the network equipment status through bridge link (you can use the brctl show command to check centos7). It was found that the network devices veth164e95d and vethda42807 are connected to the docker0 bridge.

[root@localhost ~]# bridge link
18: veth164e95d@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 state forwarding priority 32 cost 2
20: vethda42807@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 state forwarding priority 32 cost 2

The Docker project will create a bridge named docker0 on the host machine by default. Any container connected to the docker0 bridge can communicate through it.
But, how do we “connect” these containers to the docker0 bridge?
At this time, we need to use a virtual device called Veth Pair.
The characteristic of the Veth Pair device is that after it is created, it always appears in the form of two virtual network cards (Veth Peer). Moreover, the data packet sent from one of the “network cards” can directly appear on the other “network card” corresponding to it, even if the two “network cards” are in different Network Namespaces
The other ends of the two devices veth164e95d and vethda42807 in the host are connected to the network cards in the container. As long as the network card in the container sends a message, it will appear on veth164e95d and vethda42807 respectively.

2.4 Container interconnection network analysis

First take a look at the running status of the container
We mapped the kafka container port 9092 to the host’s port 9092. The kafka client can connect to the kafka middleware through 9092

[root@localhost ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
43a5066a11f5 bitnami/zookeeper "/opt/bitnami/script…" 2 weeks ago Up 6 minutes 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 8080/tcp zookeeper
922e61e655f6 bitnami/kafka:latest "/opt/bitnami/script…" 2 weeks ago Up 5 minutes 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp kafka

Take another look at the network conditions of kafka and zookeeper

[root@localhost ~]# docker inspect kafka
....omitted....
"Networks":
{<!-- -->
"bridge": {<!-- -->
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "6b81b63148c199d79c62758e548a80732b9401231ccd741783c220077a1d7a93",
"EndpointID": "9824ca7180c438118e70be86d055b02c74f7ea82225db7c9be264e43ee5e6d32",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.3",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:03",
"DriverOpts": null
}
}

You can see that kafka’s IP is 172.17.0.3 and the gateway is 172.17.0.1
Take another look at zookeeper

[root@localhost ~]# docker inspect zookeeper
....omitted....
 "Networks":
 {<!-- -->
"bridge": {<!-- -->
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "6b81b63148c199d79c62758e548a80732b9401231ccd741783c220077a1d7a93",
"EndpointID": "0b057f5d03cfd775de26a2de03d707e6b5b84fd0321b2d298a5399516cb75acc",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02",
"DriverOpts": null
}
}

The IP of zookeeper is 172.17.0.2 and the gateway is 172.17.0.1
Now, look at this picture again. Is it clearer?

Conclusion 1: Different containers on the same host can communicate through the docker0 bridge

3. How the host accesses the container

Through the above analysis, containers can communicate through the docker0 bridge. So how does the host access the container?

[root@localhost ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3
0.0.0.0 192.168.56.100 0.0.0.0 UG 101 0 0 enp0s8
10.0.2.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.56.0 0.0.0.0 255.255.255.0 U 101 0 0 enp0s8
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0

Through the route -n command, you can view the routing rules of the host. One of them, the 172.17.0.0 network segment, will send the packet through docker0.
Let’s try to ping 172.17.0.2, open a new window, and capture the packet through tcpdump.

[root@localhost ~]# ping 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.176 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.120 ms
64 bytes from 172.17.0.2: icmp_seq=3 ttl=64 time=0.134 ms

It can be seen that through the docker0 bridge on the host, network packets can go directly to the inside of the container.

[root@localhost ~]# tcpdump -i docker0 -nn icmp
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
00:54:22.019423 IP 172.17.0.1 > 172.17.0.2: ICMP echo request, id 9341, seq 1, length 64
00:54:22.019492 IP 172.17.0.2 > 172.17.0.1: ICMP echo reply, id 9341, seq 1, length 64
00:54:23.033807 IP 172.17.0.1 > 172.17.0.2: ICMP echo request, id 9341, seq 2, length 64

Conclusion 2: The host can access the container through the 172.17.0.0 network segment, and this network segment has a routing rule that sends the packets from this network segment to the docker0 bridge to enter the container

4. How the inside of the container communicates with the external network

For the convenience of demonstration, this time we start an nginx container

[root@localhost ~]# docker run -d -p 8080:80 --name mynginx nginx:latest
[root@localhost ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2290b7d3a4ff nginx:latest "/docker-entrypoint.…" 2 months ago Up 6 seconds 0.0.0.0:8080->80/tcp, :::8080->80/tcp mynginx

Port 80 inside the container is mapped to port 8080 on the host. It can be accessed successfully through the host’s IP, as shown in the figure below

How do network packets reach the container from the outside? Let’s make a bold guess. When the network packet reaches the machine, it should undergo destination address translation, rewrite the destination address of the network packet accessing the host, and then pass through the docker0 bridge, so that it can access the inside of the container.
Since it is network address translation, it is nat. Let’s check the iptables nat rules.

[root@localhost ~]# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 211 packets, 19122 bytes)
 pkts bytes target prot opt in out source destination
   84 5992 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 74 packets, 4424 bytes)
 pkts bytes target prot opt in out source destination

Chain POSTROUTING (policy ACCEPT 691 packets, 54705 bytes)
 pkts bytes target prot opt in out source destination
    0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
  669 52735 LIBVIRT_PRT all -- * * 0.0.0.0/0 0.0.0.0/0
    0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80

Chain OUTPUT (policy ACCEPT 688 packets, 54549 bytes)
 pkts bytes target prot opt in out source destination
    0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL

Chain LIBVIRT_PRT (1 references)
 pkts bytes target prot opt in out source destination
   10 695 RETURN all -- * * 192.168.122.0/24 224.0.0.0/24
    0 0 RETURN all -- * * 192.168.122.0/24 255.255.255.255
    0 0 MASQUERADE tcp -- * * 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
    0 0 MASQUERADE udp -- * * 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
    0 0 MASQUERADE all -- * * 192.168.122.0/24 !192.168.122.0/24

CHAIN DOCKER (2 references)
 pkts bytes target prot opt in out source destination
   72 4320 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
    3 156 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80

iptables rule analysis

Incoming traffic analysis

  • The PREROUTING chain references a custom chain DOCKER
  • Let’s take a look at the DOCKER custom chain. There is a DNAT rule, which is destination address translation. For packets coming from a non-docker0 network card and the port is 8080, then the destination address is rewritten to 172.17.0.2:80.
  • From our [Conclusion 2: The host can access the container through the 172.17.0.0 network segment, and this network segment has a routing rule, the packets from this network segment are sent to the docker0 bridge and then enter the container] we can conclude that, External traffic can now enter the container

Conclusion 3: The inside and outside of the container communicate with each other. When external traffic accesses the IP and port of the host, the PREROUTING chain will perform source address translation, so that it can enter the inside of the container

Outgoing traffic analysis

  • Outgoing traffic must go through snat source address conversion and be converted into the host’s address.
  • You can see the dynamic snat below, namely MASQUERADE
Chain POSTROUTING (policy ACCEPT 691 packets, 54705 bytes)
 pkts bytes target prot opt in out source destination
    0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
  669 52735 LIBVIRT_PRT all -- * * 0.0.0.0/0 0.0.0.0/0
    0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80

Look at the first rule. For packets sent out from 172.17.0.0 and not from docker0, source address translation is performed. The source address of the packets sent out in this way is the IP and port of the host, not the address of the 172.17.0.0 network segment of the container.
Conclusion 4: When the traffic inside the container goes out, the source address snat will be done in the POSTROUTING chain. In this way, the return message received by the client when accessing nginx will be deceived into thinking that it is sent by the host

syntaxbug.com © 2021 All Rights Reserved.