Article directory
-
- 1. pod status
-
- 1.1 Types of container startup errors
- 1.2 ImagePullBackOff error
- 1.3 CrashLoopBackOff
- 1.4 Pending
- 2. Service connection status
- 3. Ingress connection status
1. pod status
Create a pod-status.yaml
apiVersion: v1 kind: Pod metadata: name: running labels: app: nginx spec: containers: - name: web image: nginx ports: - name: web containerPort: 80 protocol: TCP --- apiVersion: v1 kind: Pod metadata: name: backoff spec: containers: - name: web image: nginx:not-exist --- apiVersion: v1 kind: Pod metadata: name: error spec: containers: - name: web image: nginx command: ["sleep", "a"]
Then, use the kubectl apply
command to apply it to the cluster.
$ kubectl apply -f pod-status.yaml pod/running created pod/backoff created pod/error created
Next, use kubectl get pods
to view the 3 Pods you just created.
NAME READY STATUS RESTARTS AGE backoff 0/1 ImagePullBackOff 0 50s error 0/1 CrashLoopBackOff 1 4s running 1/1 running 0 50s
In this example, we have created 3 Pods in total, and their states include ImagePullBackOff
, CrashLoopBackOff
and Running
.
1.1 Types of container startup errors
- ErrImagePull
- ImageInspectError
- ErrImageNeverPull
- Registry Unavailable
- InvalidImageName
1.2 ImagePullBackOff error
Caused by:
- The image name or version is wrong. In the
backoff
Pod we just created, the image we specified isnginx:not-exist
, but the actual image version isnot- exist
does not exist, and naturally an error will be thrown. - A private image was specified, but no pull credentials were provided.
We can use the kubectl describe
command to see the details of the error.
$ kubectl describe pod backoff Events: Type Reason Age From Message ---- ------- ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned default/backoff to kind-control-plane Normal Pulling 8m43s (x4 over 10m) kubelet Pulling image "nginx:not-exist" Warning Failed 8m40s (x4 over 10m) kubelet Failed to pull image "nginx:not-exist": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/library/nginx:not-exist ": failed to resolve reference "docker.io/library/nginx:not-exist": docker.io/library/nginx:not-exist: not found Warning Failed 8m40s (x4 over 10m) kubelet Error: ErrImagePull Warning Failed 8m11s (x6 over 10m) kubelet Error: ImagePullBackOff Normal BackOff 4s (x42 over 10m) kubelet Back-off pulling image "nginx:not-exist"
From the third line in the Event
event in the returned result, we can find that the cluster has thrown the exception of nginx:not-exist: not found
, so we have located specific error.
1.3 CrashLoopBackOff
CrashLoopBackOff
is a typical container runtime error. In addition, you may also see similar RunContainerError
errors. There are two main reasons for this error.
- An error occurred when the application in the container was started, for example, the failure to read the configuration caused it to fail to start.
- Misconfiguration, such as misconfigured container start command.
In the example of the error Pod created above, I intentionally misconfigured the start command of the container so that we also see the CrashLoopBackOff
exception. For the errors in the running phase, most of the errors come from the startup phase of the business itself. Therefore, we only need to check the logs of the Pod to find the problem. For example, let’s try to view the logs of the error Pod.
$ kubectl logs error sleep: invalid time interval 'a' Try 'sleep --help' for more information.
From the returned log, the sleep command throws an exception, that is, the parameter error. In a production environment, we generally use Deoloyment workloads to manage Pods. When Pods run abnormally, the Pod name will change with restarts. At this time, you can add - when viewing logs -previous
parameter to view previous Pod logs.
$ kubectl logs pod-name --previous
1.4 Pending
Occasionally, you might not see an error state of being up and running, but when looking at the status, you will see the Pod is in the Pending
state. You can try to save the following content as pending-pod.yaml
file, and deploy this example to the cluster via kubectl apppy -f
.
apiVersion: v1 kind: Pod metadata: name: pending spec: containers: - name: web image: nginx resources: requests: cpu: 32 memory: 64Gi
Next, try to check the status of the Pod.
$ kubectl get pods NAME READY STATUS RESTARTS AGE pending 0/1 pending 0 15s
From the returned results, we will find that the Pod does not throw any exception, but its status is Pending
, and READY 0/1
indicates that the Pod is not ready to receive external traffic. There may be the following three main reasons for the Pending status.
- The cluster resources are insufficient to schedule the Pod.
- Pod is waiting for PVC persistent storage volume.
- Pod resource usage exceeds namespace resource quota.
In the above example, we have configured a resource request quota of 32 cores and 64G for the Pod, which obviously exceeds the cluster resources. At this point, the Pod will be in the Pending state, and Kubernetes will always try to schedule it. Once a new node is added and the resource requirements are met, the Pod will be restarted. The Pending state is actually a case of abnormal container startup, but it is not an error, but it is temporarily unschedulable. To find out the specific reason for the Pending status, you can refer to the method of finding container startup errors and use the kubectl describe command to view it.
$ kubectl describe pod pending Events: Type Reason Age From Message ---- ------- ---- ---- ------- Warning FailedScheduling 11m default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. Warning FailedScheduling 6m45s default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
From the Event event that returns the result, we can conclude that the reason for Pending is that there is no Node node that meets the CPU resource conditions. Kubernetes tried to schedule twice, and the exception was the same.
2. Service connection status
Sometimes, even though the Pod is running and ready, we cannot request the business service from the outside. At this time, it is necessary to pay attention to the connection status of the Service. Service is the core component of Kubernetes, and it is available under normal circumstances. In a production environment, traffic generally flows from Ingress to Service to Pod. Therefore, when the business service of the Pod cannot be accessed externally, we can first check from the innermost layer, that is, the Pod. The easiest way is to directly connect to the Pod and initiate a request to check whether the Pod can work normally.
To access the Pod locally, we can use kubectl port-forward
for port forwarding, taking the Nginx Pod we just created as an example.
$ kubectl port-forward pod/running 8081:80
If the request to access port 8081 locally is successful, it means that the Pod and the business level are normal. Next, we further check the connection status of the Service. Similarly, the easiest way is to directly connect to the Service to initiate a request through port forwarding.
$ kubectl port-forward service/<service-name> local_port:service_pod
If the requested Service can return the content correctly, it means that the Service layer is also normal. If content cannot be returned, there are usually two possible reasons for this.
- The Service Selector selector did not match the Pod correctly.
- Service’s Port and TargetPort are misconfigured.
By fixing these two configurations, you should be able to fix the Service to Pod connectivity issues.
3. Ingress connection status
At this point, if you still cannot access business services from Ingress, you need to continue to troubleshoot Ingress. First, verify that the Pods for the Ingress controller are running.
$ kubectl get pods -n ingress-nginx NAME READY STATUS RESTARTS AGE ingress-nginx-controller-8544b85788-c9m2g 1/1 Running 6 (4h35m ago) 1d
After confirming that there is no abnormality in the Ingress controller, it can basically be confirmed that the failure is caused by a wrong configuration of the Ingress policy. You can view the Ingress policy with the kubectl describe ingress
command.
$ kubectl describe ingress ingress_name Name: ingress_name Namespace: default Rules: Host Path Backends ---- ---- -------- /running-service:80 (<error: endpoints "running-service" not found>)