Background
An unexpected power outage in the computer’s hibernation state causes the virtual machine to directly enter a shutdown state.
Question
kubectl command error
[root@master01 ~]#kubectl get node The connection to the server master01.kktb.org:6443 was refused - did you specify the right host or port?
kubelet service error
Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.251784 747 kubelet.go:2424] "Error getting node" err="node "master01.kktb.org\ "not found" Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.329952 747 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://master01 .kktb.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master01.kktb.org?timeout=10s": dial tcp 10.0.6.5:6443: connect: connection refused Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.352614 747 kubelet.go:2424] "Error getting node" err="node "master01.kktb.org" not found " Oct 15 08:39:37 master01.kktb.org kubelet[747]: I1015 08:39:37.384900 747 kubelet_node_status.go:70] "Attempting to register node" node="master01.kktb.org" Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.385258 747 kubelet_node_status.go:92] "Unable to register node with API server" err="Post "https://master01 .kktb.org:6443/api/v1/nodes": dial tcp 10.0.6.5:6443: connect: connection refused" node="master01.kktb.org"
There is no program monitoring port 6443. It may be that etcd is faulty.
Containers on the master node
[root@master01 kubelet.service.d]#docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9924571324d2 e3ed7dee73e9 "kube-scheduler --au…" 18 minutes ago Up 18 minutes k8s_kube-scheduler_kube-scheduler-master01.kktb.org_kube-system_bca604f760be36f9a20d9b66b0bf821d_7 9d6171792af2 88784fb4ac2f "kube-controller-man…" 18 minutes ago Up 18 minutes k8s_kube-controller-manager_kube-controller-manager-master01.kktb.org_kube-system_13461bcfe1de9f0e85e957cde36c42d2_10 eb3fdcf0b392 registry.k8s.io/pause:3.6 "/pause" 18 minutes ago Up 18 minutes k8s_POD_etcd-master01.kktb.org_kube-system_67c8baa117269300c9b5e18d5b0a0e44_5 302e36ffb616 registry.k8s.io/pause:3.6 "/pause" 18 minutes ago Up 18 minutes k8s_POD_kube-scheduler-master01.kktb.org_kube-system_bca604f760be36f9a20d9b66b0bf821d_5 a8f8f02011c7 registry.k8s.io/pause:3.6 "/pause" 18 minutes ago Up 18 minutes k8s_POD_kube-controller-manager-master01.kktb.org_kube-system_13461bcfe1de9f0e85e957cde36c42d2_5 3fde7a0d5aae registry.k8s.io/pause:3.6 "/pause" 18 minutes ago Up 18 minutes k8s_POD_kube-apiserver-master01.kktb.org_kube-system_5846def9d7d3ade7311eb0f023db33ff_5
Mirrors exist
[root@master01 kubelet.service.d]#docker images REPOSITORY TAG IMAGE ID CREATED SIZE istio/proxyv2 1.19.1 b3547b3ef18b 2 weeks ago 251MB ubuntu jammy 3565a89d9e81 2 weeks ago 77.8MB ubuntu latest 3565a89d9e81 2 weeks ago 77.8MB flannel/flannel v0.22.3 e23f7ca36333 3 weeks ago 70.2MB flannel/flannel-cni-plugin v1.2.0 a55d1bad692b 2 months ago 8.04MB hello-world latest 9c7a54a9a43c 5 months ago 13.3kB k8s.gcr.io/kube-apiserver v1.24.0 529072250ccc 17 months ago 130MB k8s.gcr.io/kube-proxy v1.24.0 77b49675beae 17 months ago 110MB k8s.gcr.io/kube-scheduler v1.24.0 e3ed7dee73e9 17 months ago 51MB k8s.gcr.io/kube-controller-manager v1.24.0 88784fb4ac2f 17 months ago 119MB k8s.gcr.io/etcd 3.5.3-0 aebe758cef4c 18 months ago 299MB k8s.gcr.io/pause 3.7 221177c6082a 19 months ago 711kB ikubernetes/proxy v0.1.1 a4fcedf206e8 23 months ago 62.4MB k8s.gcr.io/coredns/coredns v1.8.6 a4ca41631cc7 2 years ago 46.8MB registry.k8s.io/pause 3.6 6270bb605e12 2 years ago 683kB ikubernetes/demoapp v1.0 3342b7518915 3 years ago 92.7MB
ps -a to see exited containers
[root@master01 kubelet.service.d]#docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 07195c9438d3 aebe758cef4c "etcd --advertise-cl…" 34 seconds ago Exited (2) 33 seconds ago k8s_etcd_etcd-master01.kktb.org_kube-system_67c8baa117269300c9b5e18d5b0a0e44_30 532fabab4479 529072250ccc "kube-apiserver --ad…" 3 minutes ago Exited (1) 3 minutes ago k8s_kube-apiserver_kube-apiserver-master01.kktb.org_kube-system_5846def9d7d3ade7311eb0f023db33ff_27
etcd container error
[root@master01 kubelet.service.d]#docker logs 07195c9438d3 {<!-- -->"level":"info","ts":"2023-10-15T09:31:34.713Z","caller":"etcdmain/etcd.go:73","msg" :"Running: ","args":["etcd","--advertise-client-urls=https://10.0.6.5:2379","--cert-file=/etc/kubernetes/pki/etcd /server.crt","--client-cert-auth=true","--data-dir=/var/lib/etcd","--experimental-initial-corrupt-check=true","-- initial-advertise-peer-urls=https://10.0.6.5:2380","--initial-cluster=master01.kktb.org=https://10.0.6.5:2380","--key-file= /etc/kubernetes/pki/etcd/server.key","--listen-client-urls=https://127.0.0.1:2379,https://10.0.6.5:2379","--listen-metrics -urls=http://127.0.0.1:2381","--listen-peer-urls=https://10.0.6.5:2380","--name=master01.kktb.org","--peer -cert-file=/etc/kubernetes/pki/etcd/peer.crt","--peer-client-cert-auth=true","--peer-key-file=/etc/kubernetes/pki/etcd /peer.key","--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt","--snapshot-count=10000","--trusted-ca-file= /etc/kubernetes/pki/etcd/ca.crt"]} {<!-- -->"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"etcdmain/etcd.go:116","msg" :"server has been already initialized","data-dir":"/var/lib/etcd","dir-type":"member"} {<!-- -->"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:131","msg" :"configuring peer listeners","listen-peer-urls":["https://10.0.6.5:2380"]} {<!-- -->"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:479","msg" :"starting with peer TLS","tls-info":"cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]} {<!-- -->"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:139","msg" :"configuring client listeners","listen-client-urls":["https://10.0.6.5:2379","https://127.0.0.1:2379"]} {<!-- -->"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:308","msg" :"starting an etcd server","etcd-version":"3.5.3","git-sha":"0452feec7","go-version":"go1.16.15","go-os":"linux ","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"master01.kktb.org ","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd /member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count ":10000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://10.0.6.5:2380"],"listen-peer-urls":["https: //10.0.6.5:2380"],"advertise-client-urls":["https://10.0.6.5:2379"],"listen-client-urls":["https://10.0.6.5: 2379","https://127.0.0.1:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host -whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes": 2147483648,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction -retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"} {<!-- -->"level":"info","ts":"2023-10-15T09:31:34.716Z","caller":"etcdserver/backend.go:81","msg" :"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"655.189μs"} {<!-- -->"level":"info","ts":"2023-10-15T09:31:35.189Z","caller":"etcdserver/server.go:508","msg" :"recovered v2 store from snapshot","snapshot-index":450049,"snapshot-size":"11 kB"} {<!-- -->"level":"warn","ts":"2023-10-15T09:31:35.190Z","caller":"snap/db.go:88","msg" :"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":450049,"snapshot-file-path":"/var/lib/etcd/member/snap/000000000006de01.snap.db" ,"error":"snap: snapshot file doesn't exist"} {<!-- -->"level":"panic","ts":"2023-10-15T09:31:35.190Z","caller":"etcdserver/server.go:515","msg" :"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3 /etcdserver.NewServer\ \t/go/src/go.etcd.io/etcd/release/etcd/server/etcdserver/server.go:515\ go.etcd.io/etcd/server/v3/embed.StartEtcd \ \t/go/src/go.etcd.io/etcd/release/etcd/server/embed/etcd.go:245\ go.etcd.io/etcd/server/v3/etcdmain.startEtcd\ \t /go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:228\ go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\ \t/go/src /go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:123\ go.etcd.io/etcd/server/v3/etcdmain.Main\ \t/go/src/go.etcd .io/etcd/release/etcd/server/etcdmain/main.go:40\ main.main\ \t/go/src/go.etcd.io/etcd/release/etcd/server/main.go:32 \ runtime.main\ \t/go/gos/go1.16.15/src/runtime/proc.go:225"} panic: failed to recover v3 backend from snapshot
It is due to failure to restore snapshot data. Check whether the other two nodes also report the same error.
Fortunately, etcd on node 2 starts normally.
[root@master02 snap]#docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES daa59123cbbf aebe758cef4c "etcd --advertise-cl…" 50 seconds ago Up 50 seconds k8s_etcd_etcd-master02.kktb.org_kube-system_3564c10547c1c588ab6c79bcad0e90d0_19
some keyword logs
[root@master02 snap]#docker logs -f daa59123cbbf {<!-- -->"level":"info","ts":"2023-10-15T09:39:06.599Z","caller":"etcdserver/backend.go:81","msg" :"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"2.526769ms"} {<!-- -->"level":"warn","ts":"2023-10-15T09:39:06.599Z","caller":"wal/util.go:90","msg" :"ignored file in WAL directory","path":"0000000000000004-00000000000685b1.wal.broken"} {<!-- -->"level":"info","ts":"2023-10-15T09:39:07.119Z","caller":"etcdserver/server.go:508","msg" :"recovered v2 store from snapshot","snapshot-index":450048,"snapshot-size":"11 kB"} {<!-- -->"level":"info","ts":"2023-10-15T09:39:07.119Z","caller":"etcdserver/server.go:521","msg" :"recovered v3 backend from snapshot","backend-size-bytes":6635520,"backend-size":"6.6 MB","backend-size-in-use-bytes":2555904,"backend-size-in -use":"2.6 MB"} {<!-- -->"level":"warn","ts":"2023-10-15T09:39:07.119Z","caller":"wal/util.go:90","msg" :"ignored file in WAL directory","path":"0000000000000004-00000000000685b1.wal.broken"} {<!-- -->"level":"info","ts":"2023-10-15T09:39:07.161Z","caller":"etcdserver/raft.go:483","msg" :"restarting local member","cluster-id":"e3d7c840b56e7ccb","local-member-id":"77c05bfe945af1a1","commit-index":458027} {<!-- -->"level":"info","ts":"2023-10-15T09:39:07.161Z","logger":"raft","caller":"etcdserver/zap_raft. go:77","msg":"77c05bfe945af1a1 switched to configuration voters=(6311186738797484145 8628998035010679201 11257876798449243399)"} {<!-- -->"level":"info","ts":"2023-10-15T09:39:07.161Z","logger":"raft","caller":"etcdserver/zap_raft. go:77","msg":"77c05bfe945af1a1 became follower at term 16"} {<!-- -->"level":"warn","ts":"2023-10-15T09:40:06.425Z","caller":"etcdhttp/metrics.go:86","msg" :"/health error","output":"{<!-- -->"health":"false","reason":"RAFT NO LEADER"}","status -code":503} {<!-- -->"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg" :"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"5795d6e69d2e1871","rtt":"0s","error":"dial tcp 10.0.6.5 :2380: connect: connection refused"} {<!-- -->"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg" :"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"9c3c034d28add507","rtt":"0s","error":"dial tcp 10.0.6.7 :2380: connect: connection refused"} {<!-- -->"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg" :"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"9c3c034d28add507","rtt":"0s","error":"dial tcp 10.0.6.7 :2380: connect: connection refused"} {<!-- -->"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg" :"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"5795d6e69d2e1871","rtt":"0s","error":"dial tcp 10.0.6.5 :2380: connect: connection refused"}
Solution
The way to solve the problem is to directly remove the etcd data directory folder of node 1 and try to copy the data on etcd2.
# First move the data file of node 1 to opt. [root@master01 ~]#mv /var/lib/etcd/member /opt/ # Copy data from node 2 to node 1 [root@master02 snap]#scp -r /var/lib/etcd/member 10.0.6.5:/var/lib/etcd/
etcd, api-server service recovery
Cluster status
sh-5.1# etcdctl --endpoints=https://10.0.6.5:2379,https://10.0.6.6:2379,https://10.0.6.7:2379 --cacert=/etc/kubernetes/ pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member list --write-out=table + ------------------ + --------- + ------------------ + ----------------------- + ----------------------- + -- ---------- + | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | + ------------------ + --------- + ------------------ + ----------------------- + ----------------------- + -- ---------- + | 5795d6e69d2e1871 | started | master01.kktb.org | https://10.0.6.5:2380 | https://10.0.6.5:2379 | false | | 77c05bfe945af1a1 | started | master02.kktb.org | https://10.0.6.6:2380 | https://10.0.6.6:2379 | false | | 9c3c034d28add507 | started | master03.kktb.org | https://10.0.6.7:2380 | https://10.0.6.7:2379 | false | + ------------------ + --------- + ------------------ + ----------------------- + ----------------------- + -- ---------- +
The kubectl command is used normally
[root@master01 ~]#kubectl get node NAME STATUS ROLES AGE VERSION master01.kktb.org Ready control-plane 22d v1.24.3 master02.kktb.org Ready control-plane 22d v1.24.3 master03.kktb.org Ready control-plane 22d v1.24.3