Kubernetes topic-12, ETCD
- ETCD special topic
-
- Section 1 Kubeadm builds kubernetes cluster
-
- etcd cluster in static pod form
- etcd cluster usage scenarios
- Section 3 Viewing ETCD cluster status information
- Section 4 ETCD cluster backup and restore
-
- Troubleshooting
- Section 3 ETCD cluster performance problem diagnosis
- Section 4 ETCD cluster performance optimization
ETCD topic
Section 1 Kubeadm builds kubernetes cluster
etcd cluster in static pod form
+ ------------------------------- + ------------------ - + --------- + --------- + ----------- + ------------ + --- -------- + ------------ + -------------------- + ------- - + | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + | https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 41 MB | false | false | 5 | 28433299 | 28433299 | | | https://192.168.20.87:2379 | 97d81fd68571e3bd | 3.5.6 | 40 MB | false | false | 5 | 28433299 | 28433299 | | | https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 41 MB | true | false | 5 | 28433299 | 28433299 | | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- +
etcd cluster usage scenarios
ETCD is a distributed, consistent KV storage system for shared configuration and service discovery. It is an excellent highly available distributed key-value pair storage database. etcd internally uses the Raft protocol as the consensus algorithm and is implemented in Go.
Scenario 1: Service Discovery
Scenario 2: Message publishing and subscription
Scenario 3: Load balancing
Scenario 4: Distributed notification and coordination
Scenario 5: Distributed locks and distributed queues
Scenario 6: Cluster monitoring and Leader election
Section 3 Viewing ETCD cluster status information
Section 4 ETCD cluster backup and restore
# Step one: Backup, because the principle of etcd adheres to the ACP consistency principle, just back up one node. ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt - -key=/etc/kubernetes/pki/etcd/peer.key snapshot save /data/etcd_backup_dir/etcd-snapshot-20230628.db # Verifiable backup file hash # Step 2: Remove kube-apiserver mv /etc/kubernetes/manifests/kube-apiserver.yaml ../ crictl ps| grep apiserver # Step 3: Remove etcd mv /etc/kubernetes/manifests/etcd.yaml ../ crictl ps| grep etcd # Step 4: Synchronize backup files to other master nodes scp /data/etcd_backup_dir/etcd-snapshot-20230628.db k8s-master02:/data/etcd_backup_dir/ scp /data/etcd_backup_dir/etcd-snapshot-20230628.db k8s-master03:/data/etcd_backup_dir/ # Step 5: Restore ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20230628.db --data-dir=/var/lib/etcd --name k8s-master01 - -cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints =https://192.168.20.85:2379 --initial-advertise-peer-urls="https://192.168.20.85:2380" --initial-cluster="k8s-master01=https://192.168.20.85: 2380,k8s-master02=https://192.168.20.86:2380,k8s-master03=https://192.168.20.87:2380" ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20230628.db --data-dir=/var/lib/etcd --name k8s-master02 - -cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints =https://192.168.20.86:2379 --initial-advertise-peer-urls="https://192.168.20.86:2380" --initial-cluster="k8s-master01=https://192.168.20.85: 2380,k8s-master02=https://192.168.20.86:2380,k8s-master03=https://192.168.20.87:2380" ETCDCTL_API=3 /data/etcd-v3.5.9-linux-amd64/etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20230628.db --data-dir=/var/lib/etcd --name k8s-master03 - -cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints =https://192.168.20.87:2379 --initial-advertise-peer-urls="https://192.168.20.87:2380" --initial-cluster="k8s-master01=https://192.168.20.85: 2380,k8s-master02=https://192.168.20.86:2380,k8s-master03=https://192.168.20.87:2380" # Step 6: Start etcd mv ../etcd.yaml /etc/kubernetes/manifests/ crictl ps| grep etcd # Step 7: Verify etcd cluster ./etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer. key endpoint status --cluster=true -w table + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + | https://192.168.20.87:2379 | 500d007f149125c3 | 3.5.6 | 9.5 MB | true | false | 2 | 13 | 13 | | | https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 9.5 MB | false | false | 2 | 13 | 13 | | | https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 9.5 MB | false | false | 2 | 13 | 13 | | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + # Step 8: Start kube-apiserver mv ../kube-apiserver.yaml /etc/kubernetes/manifests/ crictl ps| grep apiserver # Step 9: Verify k8s cluster [root@k8s-master01 etcd-v3.5.9-linux-amd64]# kubectl get node -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-master01 Ready control-plane 9h v1.26.5 192.168.20.85 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2 k8s-master02 Ready control-plane 9h v1.26.5 192.168.20.86 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2 k8s-master03 Ready control-plane 9h v1.26.5 192.168.20.87 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2 k8s-node01 Ready <none> 9h v1.26.5 192.168.20.126 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 containerd://1.7.2 k8s-node02 Ready <none> 9h v1.26.5 192.168.20.127 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 containerd://1.7.2 k8s-node03 Ready <none> 9h v1.26.5 192.168.20.156 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2 [root@k8s-master01 etcd-v3.5.9-linux-amd64]# kubectl get all NAME READY STATUS RESTARTS AGE pod/nginx-748c667d99-62pwr 1/1 Running 0 47m pod/nginx-748c667d99-9rn52 1/1 Running 0 47m pod/nginx-748c667d99-lm7gj 1/1 Running 0 47m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 9h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx 3/3 3 3 47m NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-748c667d99 3 3 3 47m # Binary deployment method On A: Stop etcd Modify startup parameters: --initial-cluster etcd-server-7-21=https://10.4.7.21:2380 \ --force-new-cluster \ Start etcd At this time, a single-node etcd cluster is started on A, and the data is still the original data. Add B to the cluster: export PATH=$PATH:/opt/etcd export ETCDCTL_API=3 etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member add etcd-server-7-22 --peer-urls=https://10.4.7.22:2380 etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member list On B: Modify startup parameters: --initial-cluster etcd-server-7-21=https://10.4.7.21:2380,etcd-server-7-22=https://10.4.7.22:2380 \ --initial-cluster-state existing \ Rename the original data directory /data/etcd/etcd-server/member Start etcd At this time, the cluster is a 2-node cluster, and B will synchronize data from A. Add C to the cluster: etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member add etcd-server-7-12 --peer-urls=https://10.4.7.12:2380 etcdctl --endpoints=https://10.4.7.21:2379 --key=/opt/etcd/certs/etcd-peer-key.pem --cert=/opt/etcd/certs/etcd-peer.pem -- cacert=/opt/etcd/certs/ca.pem member list On C: Modify startup parameters: --initial-cluster-state existing \ Rename the original data directory /data/etcd/etcd-server/member Start etcd At this time, the cluster is a 3-node cluster. C will synchronize data from the leader and the cluster will restore high availability. </code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack.png" alt ="" title="">
Troubleshooting
- ETCD node failed to rejoin the cluster
# Failed to rejoin the cluster, etcd verification failed. [root@k8s-master03 ~]# kubeadm join 192.168.20.85:6443 --token u870fc.0b9a2lehe16vn2fe --discovery-token-ca-cert-hash sha256:61a831b5b48247db8bbb58d69aa71c05dab0fefacf64d52eb f5fc11ffc6a025b --control-plane --certificate-key 2b85632803f92afda7fde61d85c72a4b4b3b605ef781ac6d7ded1ac023b824d7 [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [download-certs] Saving the certificates to the folder: "/etc/kubernetes/pki" [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [k8s-master03 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.1.0.1 192.168.20.87 192.168.20.85] [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [k8s-master03 localhost] and IPs [192.168.20.87 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [k8s-master03 localhost] and IPs [192.168.20.87 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Valid certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://192.168.20.87:2379 with maintenance client: context deadline exceeded To see the stack trace of this error execute with --v=5 or higher # The etcd cluster is not deleted cleanly. Delete the historical etcd cluster information of the cluster and then rejoin the cluster. [root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/ etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member list 500d007f149125c3, started, k8s-master03, https://192.168.20.87:2380, https://192.168.20.87:2379, false 63607c0f73ed60dd, started, k8s-master01, https://192.168.20.85:2380, https://192.168.20.85:2379, false ad45a8944d9cf673, started, k8s-master02, https://192.168.20.86:2380, https://192.168.20.86:2379, false [root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/ etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member remove 500d007f149125c3 Member 500d007f149125c3 removed from cluster b02c44ec1868eb23 [root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/ etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member list 63607c0f73ed60dd, started, k8s-master01, https://192.168.20.85:2380, https://192.168.20.85:2379, false ad45a8944d9cf673, started, k8s-master02, https://192.168.20.86:2380, https://192.168.20.86:2379, false [root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer .crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint status --cluster=true -w table + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + | https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 26 MB | false | false | 5 | 7681619 | 7681619 | | | https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 26 MB | true | false | 5 | 7681619 | 7681619 | | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + [root@k8s-master01 etcd-v3.5.9-linux-amd64]# ./etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer .crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint status --cluster=true -w table + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + | https://192.168.20.85:2379 | 63607c0f73ed60dd | 3.5.6 | 26 MB | false | false | 5 | 7683843 | 7683843 | | | https://192.168.20.87:2379 | 97d81fd68571e3bd | 3.5.6 | 26 MB | false | false | 5 | 7683843 | 7683843 | | | https://192.168.20.86:2379 | ad45a8944d9cf673 | 3.5.6 | 26 MB | true | false | 5 | 7683843 | 7683843 | | + ------------------------------- + ------------------ + - -------- + --------- + ----------- + ------------ + ------ ----- + ------------ + -------------------- + -------- + </code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack.png" alt ="" title="">
Section 3 Diagnosis of ETCD cluster performance problems
In particular, k8s clusters built based on the underlying virtualization layer cannot well protect the impact of sudden performance interference on a single computing virtual machine in terms of resource isolation.
- Network IO problem
- Disk IO problem
Section 4 ETCD Cluster Performance Optimization
- Transaction read and write network card bandwidth has little impact
- Functional isolation of different network cards, business network card 1, etcd network card 2
- For large broadband synchronization services, isolate them at the access switch.
- For virtualized hosts, IO-intensive services are scheduled on different nodes.
- IO disk problem handling
- Isolation, try not to schedule IO intensively on the same host
- Prepare high-performance disk IO, sub-second IO, meet millions of IOPS, and eliminate the original mechanical or low-configuration SSD
- Use cloud hosts and configure ESSD disk types to isolate other middleware services and reduce the burden of a single node role