[Docker] Linux routing connects two different network segment namespaces, connecting the namespace and the host

If the two namespaces are in different subnets, they cannot be connected through the bridge. Instead, they need to perform Layer 3 forwarding through the router. However, Linux does not provide a virtual router device like a virtual bridge because Linux itself has a router function.

The working principle of the router is this: there are 2 or more network interfaces on the router, and each network interface is on a different three-layer subnet. The router will forward the data packets received from one network interface to another network interface according to the internal routing and forwarding table, thus realizing interoperability between different three-layer subnets. The Linux kernel provides the IP Forwarding function. After IP Forwarding is enabled, IP data packets can be forwarded on different network interfaces, which is equivalent to realizing the function of a router.

Enable routing forwarding

The IP Forwarding function of Linux is not enabled by default. It can be enabled by the following method:

Add the following content under /etc/sysctl.conf:

net.ipv4.ip_forward=1
net.ipv6.conf.default.forwarding=1
net.ipv6.conf.all.forwarding=1

Then use sysctl -p to reload the configuration file:

$ sysctl -p /etc/sysctl.conf

Use routing to connect two namespaces

Next, we experiment to connect namespaces in two different three-layer subnets through Linux’s own routing function. The network topology of this experiment is shown in the figure below.

Note that the router at the bottom of the figure does not correspond to a physical or virtual router device, but is implemented using a namespace with two virtual network cards. Since the Linux kernel enables the IP forwarding function, the ns-router namespace can be used on both of them. IP data packets are forwarded between network cards in different subnets to implement routing functions.

Create namespace

Create three namespaces named ns0, ns1, and ns-router, where ns0 and ns1 serve as namespaces for two different network segments, and ns-router is responsible for the routing function.

$ ip netns add ns0
$ip netns add ns1
$ip netns add ns-router

$ip netns list
ns-router
ns1
ns0

Create veth

Create two veths to connect the two namespaces and routers.

$ ip link add type veth
$ ip link add type veth

$ip link
56: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 2e:40:31:14:9e:5d brd ff:ff:ff:ff:ff:ff
57: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 86:a3:bf:bc:2c:82 brd ff:ff:ff:ff:ff:ff
58: veth2@veth3: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:c5:84:06:e6:76 brd ff:ff:ff:ff:ff:ff
59: veth3@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:be:88:01:8c:c0 brd ff:ff:ff:ff:ff:ff

Add veth into namespace

Use veth pair to connect ns0 and ns1 to the router implemented by ns-router.

$ ip link set veth0 netns ns0
$ ip link set veth1 netns ns-router
$ ip link set veth2 netns ns1
$ ip link set veth3 netns ns-router

Assign ip to veth

Set the IP address for the virtual network card. ns0 and ns1 are on the two subnets 192.168.0.0/24 and 192.168.1.0/24 respectively, and the two network cards of ns-router are connected to these two subnets respectively.

$ ip netns exec ns0 ip addr add 192.168.0.2/24 dev veth0
$ ip netns exec ns-router ip addr add 192.168.0.1/24 dev veth1
$ ip netns exec ns1 ip addr add 192.168.1.2/24 dev veth2
$ ip netns exec ns-router ip addr add 192.168.1.1/24 dev veth3

Enable veth

Set the status of the network card to up.

$ ip netns exec ns0 ip link set veth0 up
$ ip netns exec ns-router ip link set veth1 up
$ ip netns exec ns-router ip link set veth3 up
$ ip netns exec ns1 ip link set veth2 up

View the ip of each namespace

View the ip of namespace ns0:

$ ip netns exec ns0 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
56: veth0@if57: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2e:40:31:14:9e:5d brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 192.168.0.2/24 scope global veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2c40:31ff:fe14:9e5d/64 scope link
       valid_lft forever preferred_lft forever

View the ip of the namespace ns-router:

$ ip netns exec ns-router ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
57: veth1@if56: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 86:a3:bf:bc:2c:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.1/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::84a3:bfff:febc:2c82/64 scope link
       valid_lft forever preferred_lft forever
59: veth3@if58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 42:be:88:01:8c:c0 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 192.168.1.1/24 scope global veth3
       valid_lft forever preferred_lft forever
    inet6 fe80::40be:88ff:fe01:8cc0/64 scope link
       valid_lft forever preferred_lft forever

View the ip of namespace ns1:

$ ip netns exec ns1 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
58: veth2@if59: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f2:c5:84:06:e6:76 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 192.168.1.2/24 scope global veth2
       valid_lft forever preferred_lft forever
    inet6 fe80::f0c5:84ff:fe06:e676/64 scope link
       valid_lft forever preferred_lft forever

Test

At this time, trying to ping ns1 from ns0 will fail. The reason is that although ns-router can perform route forwarding, the IP address of ns1 is not in the subnet of ns0. ns0 cannot find the corresponding route when trying to send IP data packets. Therefore, An error will be reported, indicating that Network is unreachable. At this time, the IP data packet failed to be sent to ns-router.

$ ip netns exec ns0 ping 192.168.1.1 -c 3
connect: Network is unreachable

$ ip netns exec ns0 ping 192.168.1.2 -c 3
connect: Network is unreachable

Add route

We add routes to the other party’s subnet in ns0 and ns1 respectively. The IP data packets sent to the other party’s subnet are first sent to the network interface of this subnet on the router, and then forwarded through the router ns-router.

$ ip netns exec ns0 ip route add 192.168.1.0/24 via 192.168.0.1
$ ip netns exec ns1 ip route add 192.168.0.0/24 via 192.168.1.1

Test again

At this time, try to ping each other in the two ns, and you will be successful.

$ ip netns exec ns0 ping 192.168.1.2 -c 3
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.045 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.040 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=63 time=0.031 ms

--- 192.168.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.031/0.038/0.045/0.009 ms

$ ip netns exec ns1 ping 192.168.0.2 -c 3
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
64 bytes from 192.168.0.2: icmp_seq=1 ttl=63 time=0.034 ms
64 bytes from 192.168.0.2: icmp_seq=2 ttl=63 time=0.042 ms
64 bytes from 192.168.0.2: icmp_seq=3 ttl=63 time=0.034 ms

--- 192.168.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.034/0.036/0.042/0.007 ms

In order to facilitate understanding, a separate namespace ns-router was used in this experiment to assume the function of the router. In fact, we can directly place the virtual network card on the veth router side in the default network namespace, and the default network namespace will assume the router function.

Use routing to connect namespace and host

When we introduced the Linux bridge earlier, we mentioned that from a network perspective, the bridge is a layer 2 device, so there is no need to set an IP. But the Linux bridge virtual device is special: we can think that the bridge comes with a network card, and the name of this network card displayed on the host is the name of the bridge. This network card is on the bridge, so it can perform Layer 2 communication with other network cards and namespaces connected to the bridge. At the same time, from the perspective of the host, the virtual bridge device is also a network card on the host’s default network namespace. When setting the network card After obtaining the IP, you can participate in the routing and forwarding of the host.

By setting an IP address for the bridge and setting the IP as the default gateway of the namespace, you can enable network communication between the namespace and the host. If you add the corresponding route on the host, you can allow the namespace to communicate with the external network.

The following shows the logical network view after setting the IP address for the Linux bridge device bridge0. Note that in the figure below, the network card bridge0 appears on the Linux bridge (bridge0) and router (default network namespace). That is, this network card works in the Linux bridge on the second layer and in the default network namespace on the third layer.

After setting bridge0 as the default gateway, you can connect to the host network 172.16.0.157/16 from ns0 and ns1. At this time, the data flow direction is as follows: ns0–(Network Bridge)–>bridge0–(IP Forwarding)–>172.16.0.157/16

Create namespace

Create namespaces ns0 and ns1:

$ ip netns add ns0
$ip netns add ns1

$ip netns list
ns1
ns0

Create veth

Create 2 pairs of veth pairs:

$ ip link add type veth
$ ip link add type veth

$ip link
60: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 22:08:b1:3d:44:a3 brd ff:ff:ff:ff:ff:ff
61: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff
62: veth2@veth3: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 62:da:16:fa:50:a0 brd ff:ff:ff:ff:ff:ff
63: veth3@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d6:59:1b:fb:e6:a6 brd ff:ff:ff:ff:ff:ff

Create bridge and enable it

$ ip link add bridge0 type bridge

$ip link
64: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:3b:75:2a:23:50 brd ff:ff:ff:ff:ff:ff

$ ip link set bridge0 up

Divide veth

Connect ns0 and ns1 to bridge0 through veth pair.

$ ip link set veth0 netns ns0
$ ip link set veth2 netns ns1
$ ip link set veth1 master bridge0
$ ip link set veth3 master bridge0

Set ip for veth

$ ip netns exec ns0 ip addr add 192.168.1.2/24 dev veth0
$ ip netns exec ns1 ip addr add 192.168.1.3/24 dev veth2

Enable veth

$ ip netns exec ns0 ip link set veth0 up
$ ip netns exec ns1 ip link set veth2 up
$ ip link set veth1 up
$ ip link set veth3 up

$ip link
61: veth1@if60: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UP mode DEFAULT group default qlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff link-netnsid 0
63: veth3@if62: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UP mode DEFAULT group default qlen 1000
    link/ether d6:59:1b:fb:e6:a6 brd ff:ff:ff:ff:ff:ff link-netnsid 1
64: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff

View namespace ip

View the ip of namespace ns0:

$ ip netns exec ns0 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
60: veth0@if61: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:08:b1:3d:44:a3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.1.2/24 scope global veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2008:b1ff:fe3d:44a3/64 scope link
       valid_lft forever preferred_lft forever

View the ip of namespace ns1:

$ ip netns exec ns1 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
62: veth2@if63: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:da:16:fa:50:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.1.3/24 scope global veth2
       valid_lft forever preferred_lft forever
    inet6 fe80::60da:16ff:fefa:50a0/64 scope link
       valid_lft forever preferred_lft forever

Test

Try pinging namespace ns1 from namespace ns0, communication is possible

$ ip netns exec ns0 ping 192.168.1.3 -c 3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.034 ms
64 bytes from 192.168.1.3: icmp_seq=3 ttl=64 time=0.031 ms

--- 192.168.1.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.026/0.030/0.034/0.005 ms

Try pinging namespace ns0 from namespace ns1, communication is possible

$ ip netns exec ns1 ping 192.168.1.2 -c 3
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.030 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.037 ms

--- 192.168.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.030/0.038/0.049/0.010 ms

Trying to ping host from namespace ns0, unable to communicate

$ ip netns exec ns0 ping 172.16.0.157 -c 3
connect: Network is unreachable

Trying to ping host from namespace ns1, unable to communicate

$ ip netns exec ns1 ping 172.16.0.157 -c 3
connect: Network is unreachable

At this time, communication between ns0 and ns1 is possible, but if you try to ping the host IP address from ns0 and ns1, you will find that the network is unreachable because the addresses are not on the same subnet and there is no corresponding route.

Assign ip to bridge0

$ ip addr add 192.168.1.1/24 dev bridge0

$ ip addr
defaultqlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 scope global bridge0
       valid_lft forever preferred_lft forever

Add a default route to the namespace

Add a default route to the namespace ns0

$ ip netns exec ns0 ip route add default via 192.168.1.1

$ ip netns exec ns0 route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 0 0 0 veth0
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 veth0

Add a default route to the namespace ns1

$ ip netns exec ns1 ip route add default via 192.168.1.1

$ ip netns exec ns1 route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 0 0 0 veth2
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 veth2

Set the IP of bridge0 as the default gateway in ns0 and ns1.

Test again

Tried pinging host from namespace ns0, can communicate

$ ip netns exec ns0 ping 172.16.0.157 -c 3
PING 172.16.0.157 (172.16.0.157) 56(84) bytes of data.
64 bytes from 172.16.0.157: icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from 172.16.0.157: icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from 172.16.0.157: icmp_seq=3 ttl=64 time=0.033 ms

--- 172.16.0.157 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.026/0.032/0.037/0.004 ms

Tried pinging host from namespace ns1, can communicate

$ ip netns exec ns1 ping 172.16.0.157 -c 3
PING 172.16.0.157 (172.16.0.157) 56(84) bytes of data.
64 bytes from 172.16.0.157: icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from 172.16.0.157: icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from 172.16.0.157: icmp_seq=3 ttl=64 time=0.038 ms

--- 172.16.0.157 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.022/0.032/0.038/0.010 ms

At this time, try to ping the host IP from ns0 and ns1, and find that normal communication is possible. Now we have opened the network between the namespace and the host by setting bridge0 as the default gateway.

Use iptables to connect namespace to external network

In the above example, although routing is used to connect the namespace and the host’s network, the external network cannot be accessed in the namespace.

Try to access Baidu in namespaces ns0 and ns1:

$ ip netns exec ns1 ping www.baidu.com -c 3
ping: www.baidu.com: Name or service not known

$ ip netns exec ns0 ping www.baidu.com -c 3
ping: www.baidu.com: Name or service not known

Next, use iptables to perform DNAT conversion to connect the namespace to the external network:

$ iptables -t nat -A POSTROUTING -s 192.168.1.1/24 -o eth0 -j MASQUERADE

$ iptables --list -t nat
Chain PREROUTING (policy ACCEPT)
target prot opt source destination

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 192.168.1.0/24 anywhere

Try to access Baidu in namespaces ns0 and ns1 again:

$ ip netns exec ns0 ping www.baidu.com -c 3
PING www.a.shifen.com (14.119.104.254) 56(84) bytes of data.
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=1 ttl=51 time=9.83 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=2 ttl=51 time=9.37 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=3 ttl=51 time=9.42 ms

--- www.a.shifen.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 9.378/9.545/9.832/0.232 ms

$ ip netns exec ns1 ping www.baidu.com -c 3
PING www.a.shifen.com (14.119.104.254) 56(84) bytes of data.
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=1 ttl=51 time=9.31 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=2 ttl=51 time=9.35 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=3 ttl=51 time=9.39 ms

--- www.a.shifen.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 9.319/9.355/9.396/0.031 ms

It was found that the external network can be accessed in the namespace ns0 and ns1.