[Docker] Explore Docker’s bridge network from the perspective of namespace and routing

Bridged networking is Docker’s default networking mode. In a bridged network, Docker creates a virtual network interface for each container and assigns the container an IP address. Containers can communicate with the host and other containers through bridged networks, and can also expose ports for external access.

Principle of communication between containers

First we create two containers:

$ docker container run -d --rm --name box1 busybox /bin/sh -c "while true; do sleep 3600; done"
e6e89f95de12eeda726fed5f4f909d32be2ea13c3cecb350acd86bc13394b769

$ docker container run -d --rm --name box2 busybox /bin/sh -c "while true; do sleep 3600; done"
c0c1a152155bcf66bed71fdc51e558f4c3b1c3632866c61a69303a4da10c2f54

$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c0c1a152155b busybox "/bin/sh -c 'while t…" 31 seconds ago Up 30 seconds box2
e6e89f95de12 busybox "/bin/sh -c 'while t…" 41 seconds ago Up 40 seconds box1

Then we try to ping container box2 in container box1:

$ docker container exec -it box2 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

$ docker container exec -it box1 ping 172.17.0.3 -c 3
PING 172.17.0.3 (172.17.0.3): 56 data bytes
64 bytes from 172.17.0.3: seq=0 ttl=64 time=0.886 ms
64 bytes from 172.17.0.3: seq=1 ttl=64 time=0.049 ms
64 bytes from 172.17.0.3: seq=2 ttl=64 time=0.106 ms

--- 172.17.0.3 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.049/0.347/0.886 ms

Why can box2 be pinged from box1? How do containers communicate with each other?

Docker uses namespace to isolate network, computing and other resources, but why can’t I see any network namespace on the host using the ip netns command?

This is because Docker hides the created network namespace link file by default, causing the ip netns command to be unable to read, which brings trouble to analyzing network principles and troubleshooting problems.

Here’s how to restore the netns namespace.

Execute the following command to obtain the container process ID:

$ docker inspect box1 | grep Pid
            "Pid": 43568,
            "PidMode": "",
            "PidsLimit": null,

$ docker inspect box2 | grep Pid
            "Pid": 43640,
            "PidMode": "",
            "PidsLimit": null,

Execute the following command to restore the process network namespace to the host directory:

$ ln -s /proc/43568/ns/net /var/run/netns/box1

$ ln -s /proc/43640/ns/net /var/run/netns/box2

If the /var/run/netns directory does not exist, you can create the directory manually as the root user.

Then execute the ip netns command to see the network namespace of the container:

$ ip netns list
box2 (id: 3)
box1 (id: 2)

View the IP addresses of network namespaces box1 and box2:

$ ip netns exec box1 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

$ ip netns exec box2 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

It is found that the IP of network namespace box1 is 172.17.0.2, and the IP of network namespace box2 is 172.17.0.3. If you want to realize two network namespaces of the same network segment, Communication requires the help of bridge.

Docker will create a bridge named docker0 by default:

$ ip link show type bridge
9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:53:1d:f7:5f brd ff:ff:ff:ff:ff:ff

Then check the veth network port of docker0:

$ brctl show docker0
bridge name bridge id STP enabled interfaces
docker0 8000.0242531df75f no vetha7d1dd5
                                                        vethadaa66f

docker0 has two veth network ports: vetha7d1dd5, vethadaa66f

Come to the host and look at the veth network port:

$ ip link show type veth
20: vethadaa66f@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether 52:4c:41:8c:91:01 brd ff:ff:ff:ff:ff:ff link-netns box1
22: vetha7d1dd5@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether 8a:e9:19:ce:72:cb brd ff:ff:ff:ff:ff:ff link-netns box2

We can see that network namespace box1 is connected to bridge0 through veth paireth0(if19)-vethadaa66f(if20), and network namespace box2 is connected through veth paireth0(if21 )-vetha7d1dd5(if22) connects to bridge0 so that network namespace box1 and network namespace box2 can communicate.

Here is a network topology diagram:

Principle of container access to external network

Network namespace + bridge alone can only achieve communication before the network namespace. If the container wants to access the external network, it needs to use iptables to implement SNAT.

Ping Baidu in box1:

$ docker exec -it box1 ping www.baidu.com -c 3
PING www.baidu.com (14.119.104.189): 56 data bytes
64 bytes from 14.119.104.189: seq=0 ttl=51 time=9.908 ms
64 bytes from 14.119.104.189: seq=1 ttl=51 time=14.939 ms
64 bytes from 14.119.104.189: seq=2 ttl=51 time=11.023 ms

--- www.baidu.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.908/11.956/14.939 ms

View the rules of iptables:

$ iptables -nvxL -t nat
Chain PREROUTING (policy ACCEPT 20 packets, 3083 bytes)
    pkts bytes target prot opt in out source destination
       0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 1 packets, 229 bytes)
    pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 2 packets, 137 bytes)
    pkts bytes target prot opt in out source destination
       0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 2 packets, 137 bytes)
    pkts bytes target prot opt in out source destination
       6 300 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0

CHAIN DOCKER (2 references)
    pkts bytes target prot opt in out source destination
       0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0

It is found that there is a rule in the POSTROUTING chain of the nat table that performs SNAT conversion on the network segment with the source address 172.17.0.0/16, so that it can communicate with the external network.

We clear all iptables rules:

$ iptables -t filter -F
$ iptables -t filter -X
$ iptables -t filter -Z
$ iptables -t nat -F
$ iptables -t nat -X
$ iptables -t nat -Z

Check all the rules again and find that the rules and custom chains have been cleared:

$ iptables -t filter -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

CHAIN FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

$ iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination

Try to access Baidu again, unable to access:

$ docker exec -it box1 ping www.baidu.com -c 3
ping: bad address 'www.baidu.com'

We manually add a nat rule using iptables:

$ iptables -t nat -A POSTROUTING -s 172.17.0.0/16 -j MASQUERADE

Visit Baidu again and find that communication is already possible:

$ docker exec -it box1 ping www.baidu.com -c 3
PING www.baidu.com (14.119.104.189): 56 data bytes
64 bytes from 14.119.104.189: seq=0 ttl=51 time=16.015 ms
64 bytes from 14.119.104.189: seq=1 ttl=51 time=9.960 ms
64 bytes from 14.119.104.189: seq=2 ttl=51 time=9.247 ms

--- www.baidu.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.247/11.740/16.015 ms

Sometimes the default execution policy of the FORWARD chain of the filter table is DROP. We need to manually change this default execution policy to ACCEPT to communicate. Use the following command:

$ iptables -P FORWARD ACCEPT

Now because we violently executed iptables -F, all Docker rules were cleared. What should we do if we want to restore Docker’s default rules? Just use the following command to restart Docker:

$ service docker restart

Of course, if it’s not too troublesome, you can also manually add the rules one by one.

Principle of port forwarding

When creating a container, you can use the -p parameter to specify that the host’s port be mapped to the container’s port, thereby forwarding requests to access the host port to the inside of the container.

First create an nginx web container and specify to map the host’s port 8080 to the container’s port 80:

$ docker container run -d --rm --name web -p 8080:80 nginx
441c77091abfeb9498d4fd21d62594d75363fb42338c4ec51a42b6f01d80e418

Access port 8080 of the host and find a successful request to the container:

$ curl localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html {<!-- --> color-scheme: light dark; }
body {<!-- --> width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

How is this port forwarding implemented? It is still implemented through our old friend iptables. Iptables is used here to implement DNAT.

Query the rules of iptables:

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
    0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80

CHAIN DOCKER (2 references)
 pkts bytes target prot opt in out source destination
    0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
    0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80

We can find that the following rules have been added to the POSTROUTING chain of the nat table, which are mainly used for web containers to access external networks:

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80

The following rules are also added to the DOCKER chain (referenced by PREROUTING) to forward requests for the host port 8080 to 172.17.0.2:80:

Chain DOCKER (2 references)
 pkts bytes target prot opt in out source destination
    0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80

Next, when we start a container, we try to configure port forwarding without specifying the -p parameter, and manually implement port forwarding through iptables configuration rules.

Start an nginx image web container without specifying port forwarding:

$ docker container run -d --rm --name web nginx

At this time, I checked the rules of iptables and found that in addition to the basic rules of docker, no new forwarding rules were added:

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0

CHAIN DOCKER (2 references)
 pkts bytes target prot opt in out source destination
    0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0

At this time, access to port 8080 of the host is also blocked:

$ curl 172.19.85.122:8080
curl: (7) Failed to connect to 172.19.85.122 port 8080: Connection refused

Add DNAT rules:

$ iptables -t nat -I DOCKER ! -i docker0 -p tcp --dport 8080 -j DNAT --to-destination 172.17.0.2:80

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    2 120 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0

CHAIN DOCKER (2 references)
 pkts bytes target prot opt in out source destination
    0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80
    0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0

At this point, the web container can be accessed through the host’s port 8080:

$ curl 172.19.85.122:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html {<!-- --> color-scheme: light dark; }
body {<!-- --> width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>