Kubernetes Special Topic-12, ETCD

Kubernetes topic-12, ETCD

  • ETCD special topic
    • Section 1 Kubeadm builds kubernetes cluster
      • etcd cluster in static pod form
      • etcd cluster usage scenarios
    • Section 3 Viewing ETCD cluster status information
    • Section 4 ETCD cluster backup and restore
      • Troubleshooting
    • Section 3 ETCD cluster performance problem diagnosis
    • Section 4 ETCD cluster performance optimization

ETCD topic

Section 1 Kubeadm builds kubernetes cluster

etcd cluster in static pod form

 + ------------------------------- + ------------------ - + --------- + --------- + ----------- + ------------ + --- -------- + ------------ + -------------------- + ------- - +
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
| https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 41 MB | false | false | 5 | 28433299 | 28433299 | |
| https://192.168.20.87:2379 | 97d81fd68571e3bd | 3.5.6 | 40 MB | false | false | 5 | 28433299 | 28433299 | |
| https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 41 MB | true | false | 5 | 28433299 | 28433299 | |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +

External link The image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly

etcd cluster usage scenarios

ETCD is a distributed, consistent KV storage system for shared configuration and service discovery. It is an excellent highly available distributed key-value pair storage database. etcd internally uses the Raft protocol as the consensus algorithm and is implemented in Go.
Scenario 1: Service Discovery
Scenario 2: Message publishing and subscription
Scenario 3: Load balancing
Scenario 4: Distributed notification and coordination
Scenario 5: Distributed locks and distributed queues
Scenario 6: Cluster monitoring and Leader election

Section 3 Viewing ETCD cluster status information

Section 4 ETCD cluster backup and restore

# Step one: Backup, because the principle of etcd adheres to the ACP consistency principle, just back up one node.
ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt - -key=/etc/kubernetes/pki/etcd/peer.key snapshot save /data/etcd_backup_dir/etcd-snapshot-20230628.db
# Verifiable backup file hash

# Step 2: Remove kube-apiserver
mv /etc/kubernetes/manifests/kube-apiserver.yaml ../
crictl ps| grep apiserver
# Step 3: Remove etcd
mv /etc/kubernetes/manifests/etcd.yaml ../
crictl ps| grep etcd
# Step 4: Synchronize backup files to other master nodes
scp /data/etcd_backup_dir/etcd-snapshot-20230628.db k8s-master02:/data/etcd_backup_dir/
scp /data/etcd_backup_dir/etcd-snapshot-20230628.db k8s-master03:/data/etcd_backup_dir/
# Step 5: Restore
ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20230628.db --data-dir=/var/lib/etcd --name k8s-master01 - -cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints =https://192.168.20.85:2379 --initial-advertise-peer-urls="https://192.168.20.85:2380" --initial-cluster="k8s-master01=https://192.168.20.85: 2380,k8s-master02=https://192.168.20.86:2380,k8s-master03=https://192.168.20.87:2380"
ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20230628.db --data-dir=/var/lib/etcd --name k8s-master02 - -cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints =https://192.168.20.86:2379 --initial-advertise-peer-urls="https://192.168.20.86:2380" --initial-cluster="k8s-master01=https://192.168.20.85: 2380,k8s-master02=https://192.168.20.86:2380,k8s-master03=https://192.168.20.87:2380"
ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20230628.db --data-dir=/var/lib/etcd --name k8s-master03 - -cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints =https://192.168.20.87:2379 --initial-advertise-peer-urls="https://192.168.20.87:2380" --initial-cluster="k8s-master01=https://192.168.20.85: 2380,k8s-master02=https://192.168.20.86:2380,k8s-master03=https://192.168.20.87:2380"
# Step 6: Start etcd
mv ../etcd.yaml /etc/kubernetes/manifests/
crictl ps| grep etcd
# Step 7: Verify etcd cluster
./etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer. key endpoint status --cluster=true -w table
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
| https://192.168.20.87:2379 | 500d007f149125c3 | 3.5.6 | 9.5 MB | true | false | 2 | 13 | 13 | |
| https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 9.5 MB | false | false | 2 | 13 | 13 | |
| https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 9.5 MB | false | false | 2 | 13 | 13 | |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
# Step 8: Start kube-apiserver
mv ../kube-apiserver.yaml /etc/kubernetes/manifests/
crictl ps| grep apiserver
# Step 9: Verify k8s cluster
[root@k8s-master01 etcd-v3.5.9-linux-amd64]# kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 Ready control-plane 9h v1.26.5 192.168.20.85 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
k8s-master02 Ready control-plane 9h v1.26.5 192.168.20.86 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
k8s-master03 Ready control-plane 9h v1.26.5 192.168.20.87 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
k8s-node01 Ready <none> 9h v1.26.5 192.168.20.126 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 containerd://1.7.2
k8s-node02 Ready <none> 9h v1.26.5 192.168.20.127 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 containerd://1.7.2
k8s-node03 Ready <none> 9h v1.26.5 192.168.20.156 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
[root@k8s-master01 etcd-v3.5.9-linux-amd64]# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/nginx-748c667d99-62pwr 1/1 Running 0 47m
pod/nginx-748c667d99-9rn52 1/1 Running 0 47m
pod/nginx-748c667d99-lm7gj 1/1 Running 0 47m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 9h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 3/3 3 3 47m

NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-748c667d99 3 3 3 47m

# Binary deployment method
On A:
Stop etcd
Modify startup parameters:
--initial-cluster etcd-server-7-21=https://10.4.7.21:2380 \
--force-new-cluster \
Start etcd

At this time, a single-node etcd cluster is started on A, and the data is still the original data.

Add B to the cluster:
export PATH=$PATH:/opt/etcd
export ETCDCTL_API=3
etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member add etcd-server-7-22 --peer-urls=https://10.4.7.22:2380

etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member list

On B:
Modify startup parameters:
--initial-cluster etcd-server-7-21=https://10.4.7.21:2380,etcd-server-7-22=https://10.4.7.22:2380 \
--initial-cluster-state existing \
Rename the original data directory /data/etcd/etcd-server/member
Start etcd

At this time, the cluster is a 2-node cluster, and B will synchronize data from A.

Add C to the cluster:
etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member add etcd-server-7-12 --peer-urls=https://10.4.7.12:2380

etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member list

On C:
Modify startup parameters:
--initial-cluster-state existing \
Rename the original data directory /data/etcd/etcd-server/member
Start etcd
At this time, the cluster is a 3-node cluster. C will synchronize data from the leader and the cluster will restore high availability.
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack.png" alt ="" title="">

Troubleshooting

  • ETCD node failed to rejoin the cluster
# Failed to rejoin the cluster, etcd verification failed.
[root@k8s-master03 ~]# kubeadm join 192.168.20.85:6443 --token u870fc.0b9a2lehe16vn2fe --discovery-token-ca-cert-hash sha256:61a831b5b48247db8bbb58d69aa71c05dab0fefacf64d52eb f5fc11ffc6a025b --control-plane --certificate-key 2b85632803f92afda7fde61d85c72a4b4b3b605ef781ac6d7ded1ac023b824d7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[download-certs] Saving the certificates to the folder: "/etc/kubernetes/pki"
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master03 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.1.0.1 192.168.20.87 192.168.20.85]
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master03 localhost] and IPs [192.168.20.87 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master03 localhost] and IPs [192.168.20.87 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://192.168.20.87:2379 with maintenance client: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher


# The etcd cluster is not deleted cleanly. Delete the historical etcd cluster information of the cluster and then rejoin the cluster.
[root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/ etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member list
500d007f149125c3, started, k8s-master03, https://192.168.20.87:2380, https://192.168.20.87:2379, false
63607c0f73ed60dd, started, k8s-master01, https://192.168.20.85:2380, https://192.168.20.85:2379, false
ad45a8944d9cf673, started, k8s-master02, https://192.168.20.86:2380, https://192.168.20.86:2379, false
[root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/ etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member remove 500d007f149125c3
Member 500d007f149125c3 removed from cluster b02c44ec1868eb23
[root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/ etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member list
63607c0f73ed60dd, started, k8s-master01, https://192.168.20.85:2380, https://192.168.20.85:2379, false
ad45a8944d9cf673, started, k8s-master02, https://192.168.20.86:2380, https://192.168.20.86:2379, false
[root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer .crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint status --cluster=true -w table
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
| https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 26 MB | false | false | 5 | 7681619 | 7681619 | |
| https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 26 MB | true | false | 5 | 7681619 | 7681619 | |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
[root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer .crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint status --cluster=true -w table
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
| https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 26 MB | false | false | 5 | 7683843 | 7683843 | |
| https://192.168.20.87:2379 | 97d81fd68571e3bd | 3.5.6 | 26 MB | false | false | 5 | 7683843 | 7683843 | |
| https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 26 MB | true | false | 5 | 7683843 | 7683843 | |
 + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +

</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack.png" alt ="" title="">

Section 3 Diagnosis of ETCD cluster performance problems

In particular, k8s clusters built based on the underlying virtualization layer cannot well protect the impact of sudden performance interference on a single computing virtual machine in terms of resource isolation.
External link The image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly

  • Network IO problem

External link The image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly

  • Disk IO problem

External link The image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly

External link The image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly

Section 4 ETCD Cluster Performance Optimization

  • Transaction read and write network card bandwidth has little impact
    • Functional isolation of different network cards, business network card 1, etcd network card 2
    • For large broadband synchronization services, isolate them at the access switch.
    • For virtualized hosts, IO-intensive services are scheduled on different nodes.
  • IO disk problem handling
    • Isolation, try not to schedule IO intensively on the same host
    • Prepare high-performance disk IO, sub-second IO, meet millions of IOPS, and eliminate the original mechanical or low-configuration SSD
    • Use cloud hosts and configure ESSD disk types to isolate other middleware services and reduce the burden of a single node role