Article directory
- Preface
- Install Prometheus on Kubernetes
-
- StatefulSet is configuring
- Detailed explanation of configmap configuration and indicator collection optimization ideas
-
- How to filter indicators (optimize indicator volume)
- How to get all k8s metrics?
- start up
-
- View metrics
Foreword
In our Prometheus monitoring architecture, the main role of prometheus on k8s is to monitor k8s resources, such as Node, Pod, StatefulSet, etc. And we need to install prometheus on each k8s cluster, and the binary deployed prometheus can be used as an indicator collection center, responsible for collecting indicators of other exporters.
Install Prometheus on Kubernetes
First explain the role of each resource separately
- Namespace: plays the role of resource isolation
- StatefulSet: prometheus service
- Configmap: prometheus configuration file
- Service: Nodeport is used to access the prometheus page outside the cluster.
- ClusterRole, ClusterRoleBinding: Define and grant cluster-wide permissions
- ServiceAccount: Provides identity for the prometheus process
In order to make the resource list look clearer, each k8s resource is listed separately here.
prometheus-namespace.yaml
apiVersion: v1 kind: Namespace metadata: name: prometheus
prometheus-StatefulSet.yaml
apiVersion: apps/v1 kind: StatefulSet metadata: name: prometheus namespace: prometheus spec: serviceName: prometheus selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: serviceAccountName:prometheus serviceAccount: prometheus volumes: - hostPath: path: /data/prometheus type: '' name: data - name: config-volume configMap: name:prometheus-config - name: timezone hostPath: path: /etc/localtime containers: - image: prom/prometheus:v2.48.0-rc.1 name: prometheus args: - "--config.file=/etc/prometheus/prometheus.yaml" # Configuration file path - "--storage.tsdb.path=/prometheus" # Specify tsdb data path - "--storage.tsdb.retention.time=2d" # Data retention period - "--web.enable-lifecycle" # Support hot update, directly execute localhost:9090/-/reload to take effect immediately ports: - containerPort: 9090 name: http securityContext: runAsUser: 0 resources: # Configure according to needs limits: cpu: '1' memory: 2Gi requests: cpu: '0' memory: '0' volumeMounts: - mountPath: "/etc/prometheus" name: config-volume - mountPath: "/prometheus" name:data - name: timezone mountPath: /etc/localtime
prometheus-ClusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: -nodes -nodes/proxy -nodes/metrics - services -endpoints -pods verbs: ["get", "list", "watch"] - apiGroups: -extensions resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"]
prometheus-ClusterRoleBinding.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: prometheus
prometheus-ServiceAccount.yaml
apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: prometheus
prometheus-Service.yaml
apiVersion: v1 kind: "Service" metadata: name: prometheus namespace: prometheus labels: name: prometheus spec: ports: - name: prometheus protocol: TCP port: 39090 nodePort: 39090 #Port exposed to the outside world targetPort: 9090 selector: app: prometheus type: NodePort
prometheus-configmap.yaml
apiVersion: v1 kind: ConfigMap metadata: name:prometheus-config namespace: prometheus data: prometheus.yaml: | global: scrape_interval: 30s evaluation_interval: 30s external_labels: origin_prometheus: k8s-prome-demo #Differentiate clusters remote_write: # Remote write to remote VM storage - url: http://10.0.1.50:8428/api/v1/write scrape_configs: - job_name: 'k8s-state-metrics' metrics_path: "/metrics" static_configs: - targets: ['kube-state-metrics.kube-system.svc:8080'] metric_relabel_configs: - source_labels: - __name__ regex: '(kube_.*_info|kube_namespace_labels|kube_node_status_.*|kube_pod_container_resource_.*|kube_pod_container_status_restarts_total).*' action: keep - job_name: 'k8s-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__meta_kubernetes_node_name] regex: (. + ) target_label: __metrics_path__ replacement: metrics/cadvisor - action: labelmap regex: __meta_kubernetes_node_label_(. + ) metric_relabel_configs: - source_labels: - __name__ regex: '(container_cpu_usage_seconds_total|container_.*_bytes|container_memory_rss|container_spec_cpu_quota).*' action: keep - source_labels: [instance] separator: ; regex: (. + ) target_label: node replacement: $1 action:replace - job_name: 'k8s-kubelet' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(. + ) - job_name: 'k8s-apiserver' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - target_label: __address__ replacement: kubernetes.default.svc:443
Don’t rush to start it, first understand the configuration so that you can customize it according to your own environment.
StatefulSet is being configured
See note
- image: prom/prometheus:v2.48.0-rc.1 name: prometheus args: - "--config.file=/etc/prometheus/prometheus.yaml" # Configuration file path - "--storage.tsdb.path=/prometheus" # Specify tsdb data path - "--storage.tsdb.retention.time=2d" # Data retention period - "--web.enable-lifecycle" # Support hot update, directly execute localhost:9090/-/reload to take effect immediately
Detailed explanation of configmap configuration and optimization ideas for indicator collection
- In the global configuration, add an
external_labels
to distinguish multiple Prometheus or k8s clusters
global: ... external_labels: origin_prometheus: k8s-prome-demo #Differentiate clusters
- Configure writing to remote storage VictoriaMetrics
remote_write: # Remote write to remote VM storage - url: http://IP of your VM:8428/api/v1/write
- In the next
scrape_configs
, there are 4 pieces of content, namely
job: k8s-state-metrics
: Collect detailed metrics about resource status provided by Kubernetes API Server, such as the status of Deployments, Nodes and Pods.job: k8s-cadvisor
: Collect container-level resource usage and performance indicators, such as CPU, memory and disk usage, through cAdvisor (Container Advisor).job: k8s-kubelet
: Collect performance and status information of nodes and Pods provided by Kubernetes Kubelet.job: k8s-apiserver
: Collect performance and health indicators of Kubernetes API Server, such as request latency, request rate and error rate.
How to filter indicators (optimize indicator volume)
For example, in the job: k8s-state-metrics
configuration, there is such a piece of metric_relabel_configs
configuration. Write a regular expression to filter the corresponding metric name.
metric_relabel_configs: - source_labels: - __name__ regex: '(kube_.*_info|kube_namespace_labels|kube_node_status_.*|kube_pod_container_resource_.*|kube_pod_container_status_restarts_total).*' action: keep
Briefly talk about optimization ideas: If you don’t know how to start at the beginning, you can get the entire Dashboard through Grafana’s related k8s Dashboard in Settings
– JSON Model
Configuration,
You can see the following configuration. container_cpu_usage_seconds_total
is one of the indicators. All indicators of the entire Dashboard are collected, so that you can roughly know which indicators you want to collect.
"expr": "topk(10, sum by (namespace, pod) (rate(container_cpu_usage_seconds_total{origin_prometheus=~"$origin_prometheus", pod != "", container!=""} )))",
How to get all k8s indicators?
- Get all metrics of
kube-state-metrics
$ kubectl get svc -n kube-system kube-state-metrics NodePort 10.68.24.12 <none> 8080:30080/TCP,8081:30081/TCP 2y179d $ curl http://<k8s-state-metrics-ip>:8080/metrics #http://10.68.24.12:8080/metrics
- Get all indicators of
k8s-cadvisor
$ kubectl get nodes NAME STATUS ROLES AGE VERSION 10.0.0.20 Ready node 2y179d v1.14.6 $ curl \ --cert admin.pem \ --key admin-key.pem \ --cacert ca.pem \ https://<node-IP>:10250/metrics/cadvisor # https://10.0.0.20:10250/metrics/cadvisor
- Get all indicators of
k8s-kubelet
$ kubectl get nodes NAME STATUS ROLES AGE VERSION 10.0.0.20 Ready node 2y179d v1.14.6 $ curl \ --cert admin.pem \ --key admin-key.pem \ --cacert ca.pem \ https://<node-IP>:10250/metrics # https://10.0.0.20:10250/metrics
- Get all indicators of
k8s-apiserver
$ kubectl get svc kubernetes NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 2y179d $ cd /etc/kubernetes/ssl $ curl \ --cert admin.pem \ --key admin-key.pem \ --cacert ca.pem \ https://<kubernetes-CLUSTER-IP>/metrics # https://10.68.0.1/metrics
Start
$ ls prometheus-namespace.yaml prometheus-StatefulSet.yaml prometheus-configmap.yaml prometheus-Service.yaml prometheus-ClusterRole.yaml prometheus-ClusterRoleBinding.yaml prometheus-ServiceAccount.yaml # one-button start $ kubectl apply -f .
- Visit Prometheus:
http://your k8s cluster IP:39090
- In the StatefulSet configuration, we configured
--web.enable-lifecycle
so that the configuration can be hot-loaded without restarting the Prometheus service. See the following command
curl -XPOST http://your k8s cluster IP:39090/-/reload
View indicator volume
- Display the top 50 indicators by data volume
topk(50, count by (__name__, job)({__name__=~". + "}))
- Display the top 50 indicators of the specified job data volume
topk(50, count by (__name__)({job="node_exporter"}))
- The amount of indicator data in prometheus
sum(count by (__name__, job)({__name__=~". + "}))
- The amount of indicator data of the specified job in prometheus
sum(count by (__name__, job)({__name__=~"node_ipvs_. + "}))
Slowly adjust and optimize indicators based on machine specifications and business needs.
To learn more, please pay attention to this column: Prometheus Monitoring