Revealing the principle of DNS resolution in docker containers

Background

While using docker these days, I encountered some problems with DNS resolution in the container. So I took some time to figure out the principle and wrote this article to share.

1. Container started by docker run command

Take starting a busybox container as an example:

root@ubuntu20:~# docker run -itd --name u1 busybox
63b59ca8aeac18a09b63aaf4a14dc80895d6de293017d01786cac98cccda62ae
root@ubuntu20:~# docker exec -it u1 sh
/ #
/ # ping www.baidu.com
PING www.baidu.com (14.119.104.189): 56 data bytes
64 bytes from 14.119.104.189: seq=0 ttl=127 time=34.976 ms
64 bytes from 14.119.104.189: seq=1 ttl=127 time=35.369 ms
^C
--- www.baidu.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 34.976/35.172/35.369 ms

You can ping external domain names in the container.

Process

View the contents of the /etc/resolv.conf file in the container:

/ # cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 192.168.30.2
search localdomain

The DNS server IP used is 192.168.30.2.

This is exactly the DNS server IP address used by the host [I am using an Ubuntu 20.04 virtual machine]. Use the systemd-resolve –status command on the host machine to see:

root@ubuntu20:~# systemd-resolve --status
....

Link 2 (ens33)
      Current Scopes: DNS
DefaultRoute setting: yes
       LLMNR setting: yes
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
  Current DNS Server: 192.168.30.2
         DNS Servers: 192.168.30.2
          DNS Domain: localdomain
          
...

Conclusion

For a container started in this way, the IP configured in the /etc/resolv.conf file inside the container is the DNS server IP used by the host.

Reference article: How does the Docker DNS work?

2. Container started by docker compose

When using docker compose, we know that a container can use the service name of another container to obtain its IP address.

Let’s look at an example. The contents of the docker-compose.yml file are as follows:

version: '2'

services:
  redis:
    image: redis:3.2

  busybox:
    image: busybox
    stdin_open: true
    tty: true

It defines two services redis and busybox.

Enter the busybox container and use “redis” to obtain the IP address of the redis container:

root@ubuntu20:~/test# docker-compose up -d
Creating network "test_default" with the default driver
Creating test_redis_1 ... done
Creating test_busybox_1 ... done
root@ubuntu20:~/test# docker-compose ps

     Name Command State Ports
-------------------------------------------------- ----------------
test_busybox_1 sh Up
test_redis_1 docker-entrypoint.sh redis ... Up 6379/tcp

root@ubuntu20:~/test# docker exec -it test_busybox_1 sh
/ #
/ # ping redis -c 1
PING redis (192.168.112.2): 56 data bytes
64 bytes from 192.168.112.2: seq=0 ttl=64 time=1.103 ms

--- redis ping statistics ---
1 packet transmitted, 1 packet received, 0% packet loss
round-trip min/avg/max = 1.103/1.103/1.103 ms

vice versa. How is this done?

Process

First look at the /etc/resolv.conf file in the container:

/ # cat /etc/resolv.conf
search localdomain
nameserver 127.0.0.11
options edns0 trust-ad ndots:0

The DNS server IP address it uses is actually a loopback address 127.0.0.11! So how can we resolve the domain name?

Grab the package from the loopback port in the container and take a look:

root@ubuntu20:~/test# docker inspect test_busybox_1 | grep Pid
            "Pid": 211432,
            "PidMode": "",
            "PidsLimit": null,
root@ubuntu20:~/test# nsenter -t 211432 -n tcpdump -i lo -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
10:11:39.223669 IP 127.0.0.1.58808 > 127.0.0.11.59712: UDP, length 34
10:11:39.223707 IP 127.0.0.1.58808 > 127.0.0.11.59712: UDP, length 34
10:11:39.224305 IP 127.0.0.11.53 > 127.0.0.1.58808: 6751 0/0/0 (23)
10:11:39.224682 IP 127.0.0.11.53 > 127.0.0.1.58808: 18524 1/0/0 A 192.168.112.2 (44)

Here is a little trick: Because there is no tcpdump program in the busybox container, use nsenter to enter the network namespace of the container and execute the packet capture program.

Through packet capture, we can see that the DNS request was sent to 127.0.0.11.59712? It means that there is a service listening to this IP address + port.
Use the ss command to check:

root@ubuntu20:~/test# nsenter -t 211432 -n ss -unlp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
UNCONN 0 0 127.0.0.11:59712 0.0.0.0:* users:(("dockerd",pid=1078,fd=78))

It was found that the one monitoring this IP address and port was actually the dockerd program! It turns out that the DNS request is sent to the dockerd program for processing!

But isn’t the DNS request destination port 53? How did it become this 59712? What’s going on? Take a look at the iptables nat table:

root@ubuntu20:~/test# nsenter -t 211432 -n iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER_OUTPUT
-N DOCKER_POSTROUTING
-A OUTPUT -d 127.0.0.11/32 -j DOCKER_OUTPUT
-A POSTROUTING -d 127.0.0.11/32 -j DOCKER_POSTROUTING
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:43107
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:59712
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 43107 -j SNAT --to-source :53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 59712 -j SNAT --to-source :53

It turns out that there is a DNAT rule that changes the destination port of a UDP packet with a destination IP of 127.0.0.11 and a port of 53 to 59712. All the truth is revealed!

Conclusion

The DNS server used by containers started with docker compose is the DNS server inside the dockerd program.

It is achieved through the following three steps:

  1. dockerd creates a udp socket listening on 127.0.0.11 in the container’s network namespace.
  2. Set the IP address of the /etc/resolv.conf file in the container to 127.0.0.11
  3. Add iptable DNAT rules to the container