Prometheus monitors Oracle database

background

This article briefly introduces how Prometheus monitors Oracle databases through exporters and what indicators should be paid attention to.

oracledb_exporter

oracledb_exporter is an application that connects to Oracle database and generates Prometheus metrics,

settings

Show how to install and set up oracledb_exporter to use Prometheus to monitor Oracle database. oracledb_exporter is deployed in the k8s cluster

Use Deployment to deploy oracledb_exporter in k8s and add annotations to enable Prometheus to automatically discover oracledb_exporter breakpoints and collect indicators.

spec:
 template:
   metadata:
     annotations:
       prometheus.io/scrape: "true"
       prometheus.io/port: "9161"
       prometheus.io/path: "/metrics"

oracledb_exporter requires Oracle’s connection information to access and generate metrics. This parameter is passed to the exporter as an environment variable. Since the connection information contains the user and password used to access the database, we will use a Kubernetes Secret to store it.

To create a password for the connection string to the Oracle database, you can use the following command:

kubectl create secret generic oracledb-exporter-secret \
    --from-literal=datasource='YOUR_CONNECTION_STRING'

In deployment, configure environment variables like this

env:
       - name: DATA_SOURCE_NAME
         valueFrom:
           secretKeyRef:
             name: oracledb-exporter-secret
             key: datasource

To make sure the connection information is correct:

system/password@//database_url:1521/database_name.your.domain.com

You can use the sqlplus docker image for detection

docker run --net='host' --rm --interactive guywithnose/sqlplus sqlplus system/password@//database_url:1521/database_name.my.domain.com

Add some custom indicators below, including slow queries and bug queries
In order to use custom indicators:

In the deployment we will add another environment variable with a route to the file of the new indicator.
Mount this new file as volume from ConfigMap

The complete configuration is as follows:

apiVersion: apps/v1
Kind: Deployment
metadata:
  name: oracledb-exporter
  namespace: database-namespace
spec:
  selector:
    matchLabels:
      app: oracledb-exporter
  replicas: 1
  template:
    metadata:
      labels:
        app: oracledb-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9161"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: oracledb-exporter
        ports:
        - containerPort: 9161
        image: iamseth/oracledb_exporter
        env:
        - name: DATA_SOURCE_NAME
          valueFrom:
            secretKeyRef:
              name: oracledb-exporter-secret
              key:datasource
        - name: CUSTOM_METRICS
          value: /tmp/custom-metrics.toml
        volumeMounts:
          - name: custom-metrics
            mountPath: /tmp/custom-metrics.toml
            subPath: custom-metrics.toml
      volumes:
        - name: custom-metrics
          configMap:
            defaultMode: 420
            name: custom-metrics

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-metrics
  namespace: database-namespace
data:
  custom-metrics.toml: |
    [[metric]]
    context = "slow_queries"
    metricsdesc = { p95_time_usecs= "Gauge metric with percentile 95 of elapsed time.", p99_time_usecs= "Gauge metric with percentile 99 of elapsed time." }
    request = "select percentile_disc(0.95) within group (order by elapsed_time) as p95_time_usecs, percentile_disc(0.99) within group (order by elapsed_time) as p99_time_usecs from v$sql where last_active_time >= sysdate - 5/(24*60)"
    [[metric]]
    context = "big_queries"
    metricsdesc = { p95_rows= "Gauge metric with percentile 95 of returned rows.", p99_rows= "Gauge metric with percentile 99 of returned rows." }
    request = "select percentile_disc(0.95) within group (order by rownum) as p95_rows, percentile_disc(0.99) within group (order by rownum) as p99_rows from v$sql where last_active_time >= sysdate - 5/(24*60)"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_bytes from user_segments where segment_type='TABLE' group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from user_segments where segment_type='TABLE PARTITION' group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as cluster_bytes from user_segments where segment_type='CLUSTER' group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_bytes from dba_segments where segment_type='TABLE' group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from dba_segments where segment_type='TABLE PARTITION' group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as cluster_bytes from dba_segments where segment_type='CLUSTER' group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY"

After creating the Secret and ConfigMap, you can apply the Deployment and check if it is getting metrics from port 9161 of the Oracle database.

If everything goes well, Prometheus will automatically discover the exporter annotated pod and start scraping metrics within a few minutes. This can be inspected in the target section of the Prometheus web interface for any metrics starting with oracledb_.

What to monitor

Performance Indicators

Wait Time: The exporter provides a range of wait time metrics for different activities in the Oracle database. They all start with the oracledb_wait_time_ prefix and they help evaluate where the database is spending more time. It can exist in I/O, network, commit, concurrency, etc. In this way, we can identify bottlenecks in the system that may affect the overall performance of the Oracle database.

Slow queries: Some queries may take longer to return results than others. If this time is higher than the timeout configured in the application for receiving a response, it will consider this a timeout error from the database and retry the query. This behavior can overload the system and affect overall performance.

In the configuration shown above, there are two custom metrics that provide information on the percentage 95 and 99 of the response time for queries executed within the last 5 minutes. These indicators are:

oracledb_slow_queries_p95_time_usecs
oracledb_slow_queries_p99_time_usecs

Active Sessions: It is important to monitor active sessions in the Oracle database. If the configured limit is exceeded, the database will reject new connections, causing application errors. The metric that provides this information is oracledb_sessions_value, and the label status can provide more information.

Activity: It is also important to monitor the operations performed by the database. For this we can rely on the following indicators:

oracledb_activity_execute_count
oracledb_activity_parse_count_total
oracledb_activity_user_commits
oracledb_activity_user_rollbacks