How to use Containerlab and Kind to quickly deploy Cilium BGP experimental environment

Follow the “Wonderful World of Linux” on the public account

Set it as a “star” and let you play Linux every day!

8685c6805b0ff9196392f326db191c33.png

1 Prerequisite knowledge

1.1 Cilium Introduction

Cilium is a Kubernetes CNI plug-in based on eBPF technology. Cilium positions the product on its official website as “eBPF-based Networking, Observability, Security” and is committed to providing eBPF-based CNI for container workloads. A collection of solutions for networking, observability and security. Cilium enables networking, observability, and security-related features by using eBPF technology to dynamically insert some control logic inside Linux that can be applied and updated without modifying application code or container configuration.

da2c463af6536c34f5dcc48df04163f8.png

1.2 Cilium BGP Introduction

BGP (Border Gateway Protocol) is a dynamic routing protocol used between AS (Autonomous System). The BGP protocol provides rich and flexible routing control strategies and was mainly used for interconnection between Internet ASs in the early days. With the development of technology, the BGP protocol is now widely used in data centers. Modern data center networks are usually based on the Spine-Leaf architecture, in which BGP can be used to propagate endpoint reachability information. 2b78cf224aba8fa959752b67dcf67c05.png

The leaf layer consists of access switches that aggregate traffic from servers and connect directly to the spine, or network core, which interconnects all leaf switches in a full mesh topology.

As Kubernetes is increasingly used in enterprises, these endpoints may be Kubernetes Pods. In order to allow the network outside the Kubernetes cluster to dynamically obtain the route of the accessed Pod through the BGP protocol, it is obvious that Cilium should introduce support for the BGP protocol. .

BGP was initially introduced in Cilium in version 1.10, by allocating a LoadBalancer type Service to the application and combining it with MetalLB to advertise routing information to BGP neighbors.

70df075324e2555ad777f58075d30af4.png

However, as IPv6 usage continued to grow, it became clear that Cilium needed BGP IPv6 functionality — including Segment Routing v6 (SRv6). MetalLB currently has limited support for IPv6 via FRR and is still experimental. The Cilium team evaluated various options and decided to move to the more feature-rich GoBGP [1].

5278ea38f7587ada8fd75115a0e338d8.png

In the latest Cilium 1.12 version, enabling support for BGP only requires setting the --enable-bgp-control-plane=true parameter and passing a new CRD CiliumBGPPeeringPolicy Enable more fine-grained and scalable configuration.

  • The same BGP configuration can be applied to multiple nodes using tag selection using the nodeSelector parameter.

  • When the exportPodCIDR parameter is set to true, all Pod CIDRs can be dynamically announced without manually specifying which routing prefixes need to be announced.

  • The neighbors parameter is used to set BGP neighbor information, usually network devices outside the cluster.

apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
 name: rack0
spec:
 nodeSelector:
   matchLabels:
     rack: rack0
 virtualRouters:
 - localASN: 65010
   exportPodCIDR: true
   neighbors:
   - peerAddress: "10.0.0.1/32"
     peerASN: 65010

1.3 Kind Introduction

Kind [2] (Kubernetes in Docker) is a tool that uses Docker containers as Node nodes to run local Kubernetes clusters. We only need to install Docker and we can quickly create one or more Kubernetes clusters in a few minutes. In order to facilitate experiments, this article uses Kind to build a Kubernetes cluster environment.

1.4 Containerlab Introduction

Containerlab[3] provides a simple, lightweight, container-based orchestration network experiment solution, supporting various containerized network operating systems, such as: Cisco, Juniper, Nokia, Arista, etc. . Containerlab can launch containers and create virtual connections between them to build user-defined network topologies based on user-defined profiles.

name: sonic01

topology:
  nodes:
    srl:
      Kind: srl
      image: ghcr.io/nokia/srlinux
    sonic:
      kind: sonic-vs
      image: docker-sonic-vs:2020-11-12

  links:
    - endpoints: ["srl:e1-1", "sonic:eth1"]

The management interface of the container will be connected to the Docker network of bridge type named clab, and the business interface is connected through the links rules defined in the configuration file. This is like the two management modes corresponding to network management in the data center: out-of-band and in-band.

ae992033f254a46410322ac362cf36b0.png

Containerlab also provides us with a wealth of experimental cases, which can be found in Lab examples[4]. We can even create a data center level network architecture through Containerlab (see 5-stage Clos fabric[5])

22c8d00c156622d9e7c34e9ab9b3be94.png

2 Prerequisite preparation

Please select the appropriate installation method according to the corresponding operating system version:

  • Install Docker: https://docs.docker.com/engine/install/

  • Install Containerlab: https://containerlab.dev/install/

  • Install Kind: https://kind.sigs.k8s.io/docs/user/quick-start/#installing-with-a-package-manager

  • Install Helm: https://helm.sh/docs/intro/install/

The configuration files used in this article can be obtained at https://github.com/cr7258/kubernetes-guide/tree/master/containerlab/cilium-bgp.

3 Start Kubernetes cluster through Kind

Prepare a Kind configuration file and create a 4-node Kubernetes cluster.

# cluster.yaml
Kind: Cluster
name: clab-bgp-cplane-demo
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true # Disable default CNI
  podSubnet: "10.1.0.0/16" # Pod CIDR
nodes:
- role: control-plane # Node role
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.0.1.2 # Node IP
        node-labels: "rack=rack0" # Node labels

- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.0.2.2
        node-labels: "rack=rack0"

- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.0.3.2
        node-labels: "rack=rack1"

- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.0.4.2
        node-labels: "rack=rack1"

Execute the following command to create a Kubernetes cluster through Kind.

kind create cluster --config cluster.yaml

bb660c6d9d989c8a49e1cfd4da65e33e.png

Check the cluster node status. Since we have not installed the CNI plug-in currently, the status of the node is NotReady.

kubectl get node

6a93a702ff65859f33b7339a3cff643f.png

4 Start Containerlab

Define Containerlab’s configuration file, create network infrastructure and connect the Kubernetes cluster created by Kind:

  • router0, tor0, tor1 serve as network devices outside the Kubernetes cluster. Set the network interface information and BGP configuration in the exec parameters. router0 establishes BGP neighbors with tor0, tor1, tor0 establishes BGP neighbors with server0, server1, router0, and tor1 establishes BGP neighbors with server2, server3, router0.

  • Setting network-mode: container: allows Containerlab to share the network namespace of containers started outside Containerlab. Set the server0, server1, server2, and server3 containers to connect to them in Section 3 respectively. On the 4 Nodes of the Kubernetes cluster created by Kind.

# topo.yaml
name: bgp-cplane-demo
topology:
  kinds:
    linux:
      cmd: bash
  nodes:
    router0:
      kind: linux
      image: frrouting/frr:v8.2.2
      labels:
        app: frr
      exec:
      - iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
      -ip addr add 10.0.0.0/32 dev lo
      -ip route add blackhole 10.0.0.0/8
      - touch /etc/frr/vtysh.conf
      - sed -i -e 's/bgpd=no/bgpd=yes/g' /etc/frr/daemons
      - usr/lib/frr/frrinit.sh start
      ->-
         vtysh -c 'conf t'
         -c 'router bgp 65000'
         -c 'bgp router-id 10.0.0.0'
         -c 'no bgp ebgp-requires-policy'
         -c ' neighbor ROUTERS peer-group'
         -c ' neighbor ROUTERS remote-as external'
         -c ' neighbor ROUTERS default-originate'
         -c ' neighbor net0 interface peer-group ROUTERS'
         -c ' neighbor net1 interface peer-group ROUTERS'
         -c 'address-family ipv4 unicast'
         -c ' redistribute connected'
         -c 'exit-address-family'
         -c '!'
            
          
    tor0:
      kind: linux
      image: frrouting/frr:v8.2.2
      labels:
        app: frr
      exec:
      -ip link del eth0
      -ip addr add 10.0.0.1/32 dev lo
      -ip addr add 10.0.1.1/24 dev net1
      -ip addr add 10.0.2.1/24 dev net2
      - touch /etc/frr/vtysh.conf
      - sed -i -e 's/bgpd=no/bgpd=yes/g' /etc/frr/daemons
      - /usr/lib/frr/frrinit.sh start
      ->-
         vtysh -c 'conf t'
         -c 'frr defaults datacenter'
         -c 'router bgp 65010'
         -c 'bgp router-id 10.0.0.1'
         -c 'no bgp ebgp-requires-policy'
         -c ' neighbor ROUTERS peer-group'
         -c ' neighbor ROUTERS remote-as external'
         -c ' neighbor SERVERS peer-group'
         -c ' neighbor SERVERS remote-as internal'
         -c ' neighbor net0 interface peer-group ROUTERS'
         -c ' neighbor 10.0.1.2 peer-group SERVERS'
         -c ' neighbor 10.0.2.2 peer-group SERVERS'
         -c 'address-family ipv4 unicast'
         -c ' redistribute connected'
         -c 'exit-address-family'
         -c '!'
          
    

    tor1:
      kind: linux
      image: frrouting/frr:v8.2.2
      labels:
        app: frr
      exec:
      -ip link del eth0
      -ip addr add 10.0.0.2/32 dev lo
      -ip addr add 10.0.3.1/24 dev net1
      -ip addr add 10.0.4.1/24 dev net2
      - touch /etc/frr/vtysh.conf
      - sed -i -e 's/bgpd=no/bgpd=yes/g' /etc/frr/daemons
      - /usr/lib/frr/frrinit.sh start
      ->-
         vtysh -c 'conf t'
         -c 'frr defaults datacenter'
         -c 'router bgp 65011'
         -c 'bgp router-id 10.0.0.2'
         -c 'no bgp ebgp-requires-policy'
         -c ' neighbor ROUTERS peer-group'
         -c ' neighbor ROUTERS remote-as external'
         -c ' neighbor SERVERS peer-group'
         -c ' neighbor SERVERS remote-as internal'
         -c ' neighbor net0 interface peer-group ROUTERS'
         -c ' neighbor 10.0.3.2 peer-group SERVERS'
         -c ' neighbor 10.0.4.2 peer-group SERVERS'
         -c 'address-family ipv4 unicast'
         -c ' redistribute connected'
         -c 'exit-address-family'
         -c '!'
    
    server0:
      kind: linux
      image: nicolaka/netshoot:latest
      network-mode: container:control-plane
      exec:
      -ip addr add 10.0.1.2/24 dev net0
      -ip route replace default via 10.0.1.1

    server1:
      kind: linux
      image: nicolaka/netshoot:latest
      network-mode: container:worker
      exec:
      -ip addr add 10.0.2.2/24 dev net0
      -ip route replace default via 10.0.2.1

    server2:
      kind: linux
      image: nicolaka/netshoot:latest
      network-mode: container:worker2
      exec:
      -ip addr add 10.0.3.2/24 dev net0
      -ip route replace default via 10.0.3.1

    server3:
      kind: linux
      image: nicolaka/netshoot:latest
      network-mode: container:worker3
      exec:
      -ip addr add 10.0.4.2/24 dev net0
      -ip route replace default via 10.0.4.1


  links:
  - endpoints: ["router0:net0", "tor0:net0"]
  - endpoints: ["router0:net1", "tor1:net0"]
  - endpoints: ["tor0:net1", "server0:net0"]
  - endpoints: ["tor0:net2", "server1:net0"]
  - endpoints: ["tor1:net1", "server2:net0"]
  - endpoints: ["tor1:net2", "server3:net0"]

Execute the following command to create the Containerlab experimental environment.

clab deploy -t topo.yaml

c4e074d82d9cec9a935a88338593a181.png

The created topology is as shown below. Currently, only BGP connections are established between tor0, tor1 and router0 devices. Since we have not set the BGP configuration of the Kubernetes cluster through CiliumBGPPeeringPolicy, the BGP connections between tor0, tor1 and the Kubernetes Node have not yet been established.

b443c3a5bc75987096ac4d7f4cfc0bb9.png

Execute the following commands respectively to view the current BGP neighbor establishment status of the three network devices: tor0, tor1, and router0.

docker exec -it clab-bgp-cplane-demo-tor0 vtysh -c "show bgp ipv4 summary wide"
docker exec -it clab-bgp-cplane-demo-tor1 vtysh -c "show bgp ipv4 summary wide"
docker exec -it clab-bgp-cplane-demo-router0 vtysh -c "show bgp ipv4 summary wide"

515c01598490c6a151e0382f16922b43.png

Execute the following command to view the BGP routing entries currently learned by the router0 device.

docker exec -it clab-bgp-cplane-demo-router0 vtysh -c "show bgp ipv4 wide"

There are currently a total of 8 routing entries, and no Pod-related routes have been learned at this time.

cad52ab11ff5de2094b48f5949c20e3b.png

In order to facilitate users to understand the experimental network structure more intuitively, Containerlab provides the graph command to generate network topology.

clab graph -t topo.yaml

Enter http://:50080 in the browser to view the topology map generated by Containerlab. 2395afd3cd107dc69f6aa2d692bad626.png

5 Install Cilium

In this example, Helm is used to install Cilium, and the Cilium configuration parameters we need to adjust are set in the values.yaml configuration file.

# values.yaml
tunnel: disabled

ipam:
  mode: kubernetes

ipv4NativeRoutingCIDR: 10.0.0.0/8

# Enable BGP function support, which is equivalent to command line execution --enable-bgp-control-plane=true
bgpControlPlane:
  enabled: true

k8s:
  requireIPv4PodCIDR: true

Execute the following command to install Cilium version 1.12 and enable BGP function support.

helm repo add cilium https://helm.cilium.io/
helm install -n kube-system cilium cilium/cilium --version v1.12.1 -f values.yaml

After waiting for all Cilium Pods to start, check the Kubernetes Node status again. You can see that all Nodes are in the Ready state.

f55993e63b90090d90986cbca640e921.png
1f8db71656e487afa9366bec1d7539c6.png

6 Cilium node configuration BGP

Next, configure CiliumBGPPeeringPolicy for the Kubernetes Node on rack0 and rack1 respectively. rack0 and rack1 respectively correspond to the label of Node, which were set in the Kind configuration file in Section 3.

The Node of rack0 establishes a BGP neighbor with tor0, and the Node of rack1 establishes a BGP neighbor with tor1, and automatically announces the Pod CIDR to the BGP neighbor.

# cilium-bgp-peering-policies.yaml
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
  name: rack0
spec:
  nodeSelector:
    matchLabels:
      rack: rack0
  virtualRouters:
  - localASN: 65010
    exportPodCIDR: true # Automatically declare Pod CIDR
    neighbors:
    - peerAddress: "10.0.0.1/32" # IP address of tor0
      peerASN: 65010
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
  name: rack1
spec:
  nodeSelector:
    matchLabels:
      rack: rack1
  virtualRouters:
  - localASN: 65011
    exportPodCIDR: true
    neighbors:
    - peerAddress: "10.0.0.2/32" # IP address of tor1
      peerASN: 65011

Execute the following command to apply CiliumBGPPeeringPolicy.

kubectl apply -f cilium-bgp-peering-policies.yaml

The created topology is as shown below. Now tor0 and tor1 have also established BGP neighbors with the Kubernetes Node.

a7d245e9ad397b44230bcaa0e51a0d56.png

Execute the following commands respectively to view the current BGP neighbor establishment status of the three network devices: tor0, tor1, and router0.

docker exec -it clab-bgp-cplane-demo-tor0 vtysh -c "show bgp ipv4 summary wide"
docker exec -it clab-bgp-cplane-demo-tor1 vtysh -c "show bgp ipv4 summary wide"
docker exec -it clab-bgp-cplane-demo-router0 vtysh -c "show bgp ipv4 summary wide"

bf44e19d610abae4815d2255b9d0adb2.png

Execute the following command to view the BGP routing entries currently learned by the router0 device.

docker exec -it clab-bgp-cplane-demo-router0 vtysh -c "show bgp ipv4 wide"

There are currently a total of 12 routing entries, of which the extra 4 routes are routes for the 10.1.x.0/24 network segment learned from the 4 Kubernetes Nodes.

a0ed1fea13fa4730a148ca4471da9e18.png

7 Verification Test

Create 1 Pod on the nodes where rack0 and rack1 are located respectively to test network connectivity.

# nettool.yaml
apiVersion: v1
Kind: Pod
metadata:
  labels:
    run: nettool-1
  name: nettool-1
spec:
  containers:
  - image: cr7258/nettool:v1
    name: nettool-1
  nodeSelector:
    rack: rack0
---
apiVersion: v1
Kind: Pod
metadata:
  labels:
    run: nettool-2
  name: nettool-2
spec:
  containers:
  - image: cr7258/nettool:v1
    name: nettool-2
  nodeSelector:
    rack: rack1

Execute the following commands to create 2 test Pods.

kubectl apply -f nettool.yaml

View the Pod’s IP address.

kubectl get pod -o wide

nettool-1 Pod is located on clanb-bgp-cplane-demo-worker (server1, rack0), and the IP address is 10.1.2.185; nettool-2 Pod is located on clanb-bgp-cplane-demo-worker3 (server3, rack1), and its IP address is 10.1.2.185. The address is 10.1.3.56.

3b0d2fb7d3cb81b28084774fe84ee20d.png

Execute the following command to try to ping nettool-2 Pod in nettool-1 Pod.

kubectl exec -it nettool-1 -- ping 10.1.3.56

You can see that nettool-1 Pod can access nettool-2 Pod normally.

f30f160f96a9d6117eeed4e10054db27.png

Next, use the traceroute command to observe the direction of network packets.

kubectl exec -it nettool-1 -- traceroute -n 10.1.3.56

d2952332bab58631f6e0a3ed0e45dd46.png

The data packet is sent from nettool-1 Pod and passes through:

  • 1.Server1’s cilium_host interface: The default route of Pods in the Cilium network points to the local cilium_host. cilium_host and cilium_net are a veth pair of devices. Cilium uses the hardcode ARP table to force the next hop of Pod traffic to be hijacked to the host of the veth pair.

01b00e593e88ec25afee0b193c5ae78e.png
130c1fa572ab8ed52d02900d7b005435.png

  • 2.net2 interface of tor0.

  • 3.The lo0 interface of router0: tor0, tor1 and router0 establish BGP neighbors through the local loopback interface lo0. This can improve BGP when there are multiple physical link backups. The robustness of neighbors will not affect neighbor relationships due to a physical interface failure.

  • 4.tor1’s lo0 interface.

  • 5.net0 interface of server3. 66c754c4ed83bce3b8df5eee2f4fe4f3.png

8 Clean up the environment

Execute the following command to clean up the experimental environment created by Containerlab and Kind.

clab destroy -t topo.yaml
kind delete clusters clab-bgp-cplane-demo

9 Reference materials

  • [1] GoBGP: https://osrg.github.io/gobgp/

  • [2] Kind: https://kind.sigs.k8s.io/

  • [3] containerlab: https://containerlab.dev/

  • [4] Lab examples: https://containerlab.dev/lab-examples/lab-examples/

  • [5] 5-stage Clos fabric: https://containerlab.dev/lab-examples/min-5clos/

  • [6] BGP WITH CILIUM: https://nicovibert.com/2022/07/21/bgp-with-cilium/

  • [7] CONTINAERlab + KinD deploy cross-network K8s clusters in seconds: https://www.bilibili.com/video/BV1Qa411d7wm?spm_id_from=333.337.search-card.all.click & amp;vd_source=1c0f4059dae237b29416579c3a5d326e

  • [8] Cilium network overview: https://www.koenli.com/fcdddb4a.html

  • [9] Cilium BGP Control Plane: https://docs.cilium.io/en/stable/gettingstarted/bgp-control-plane/#cilium-bgp-control-plane

  • [10] Cilium 1.12 – Ingress, Multi-Cluster, Service Mesh, External Workloads, and much more: https://isovalent.com/blog/post/cilium-release-112/#vtep-support

  • [11] Cilium 1.10: WireGuard, BGP Support, Egress IP Gateway, New Cilium CLI, XDP Load Balancer, Alibaba Cloud Integration and more: https://cilium.io/blog/2021/05/20/cilium- 110/

  • [12] Life of a Packet in Cilium: Field exploration of Pod-to-Service forwarding path and BPF processing logic: https://arthurchiao.art/blog/cilium-life-of-a-packet-pod-to -service-zh/

This article is reprinted from: “Se7en’s Architecture Notes”, original text: https://url.hi-linux.com/aC0c4, the copyright belongs to the original author. Welcome to submit articles. Submission email: [email protected].

aec812110efeb790474b50fd3167481d.gif

Recently, we established a Technical Exchange WeChat Group. At present, many great masters in the industry have joined the group. Interested students can join and exchange technology with us. In the “Wonderful World of Linux”, directly reply to “Add the Group” on the public account Invite you to join the group.

fa2ab4687a8d33d40be577a74977bf07.png

You may also like

Click on the image below to read

685847b4034f3ee88199fb922f782e3d.png

How to use WireGuard to build asymmetric routing

759bb5630888fa83fc8fb5b3afb6b144.png
Click on the picture above to receive takeaway red envelopes for free every day from “Meituan | Ele.me”

b43541bb659f8aa1867403e929fde78d.png

For more interesting Internet news, follow the “Wonderful Internet” video account to learn all about it!