background
This article briefly introduces how Prometheus monitors Oracle databases through exporters and what indicators should be paid attention to.
oracledb_exporter
oracledb_exporter is an application that connects to Oracle database and generates Prometheus metrics,
settings
Show how to install and set up oracledb_exporter to use Prometheus to monitor Oracle database. oracledb_exporter is deployed in the k8s cluster
Use Deployment to deploy oracledb_exporter in k8s and add annotations to enable Prometheus to automatically discover oracledb_exporter breakpoints and collect indicators.
spec: template: metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: "9161" prometheus.io/path: "/metrics"
oracledb_exporter requires Oracle’s connection information to access and generate metrics. This parameter is passed to the exporter as an environment variable. Since the connection information contains the user and password used to access the database, we will use a Kubernetes Secret to store it.
To create a password for the connection string to the Oracle database, you can use the following command:
kubectl create secret generic oracledb-exporter-secret \ --from-literal=datasource='YOUR_CONNECTION_STRING'
In deployment, configure environment variables like this
env: - name: DATA_SOURCE_NAME valueFrom: secretKeyRef: name: oracledb-exporter-secret key: datasource
To make sure the connection information is correct:
system/password@//database_url:1521/database_name.your.domain.com
You can use the sqlplus docker image for detection
docker run --net='host' --rm --interactive guywithnose/sqlplus sqlplus system/password@//database_url:1521/database_name.my.domain.com
Add some custom indicators below, including slow queries and bug queries
In order to use custom indicators:
- In the deployment we will add another environment variable with a route to the file of the new indicator.
- Mount this new file as volume from ConfigMap
The complete configuration is as follows:
apiVersion: apps/v1 Kind: Deployment metadata: name: oracledb-exporter namespace: database-namespace spec: selector: matchLabels: app: oracledb-exporter replicas: 1 template: metadata: labels: app: oracledb-exporter annotations: prometheus.io/scrape: "true" prometheus.io/port: "9161" prometheus.io/path: "/metrics" spec: containers: - name: oracledb-exporter ports: - containerPort: 9161 image: iamseth/oracledb_exporter env: - name: DATA_SOURCE_NAME valueFrom: secretKeyRef: name: oracledb-exporter-secret key:datasource - name: CUSTOM_METRICS value: /tmp/custom-metrics.toml volumeMounts: - name: custom-metrics mountPath: /tmp/custom-metrics.toml subPath: custom-metrics.toml volumes: - name: custom-metrics configMap: defaultMode: 420 name: custom-metrics
ConfigMap:
apiVersion: v1 kind: ConfigMap metadata: name: custom-metrics namespace: database-namespace data: custom-metrics.toml: | [[metric]] context = "slow_queries" metricsdesc = { p95_time_usecs= "Gauge metric with percentile 95 of elapsed time.", p99_time_usecs= "Gauge metric with percentile 99 of elapsed time." } request = "select percentile_disc(0.95) within group (order by elapsed_time) as p95_time_usecs, percentile_disc(0.99) within group (order by elapsed_time) as p99_time_usecs from v$sql where last_active_time >= sysdate - 5/(24*60)" [[metric]] context = "big_queries" metricsdesc = { p95_rows= "Gauge metric with percentile 95 of returned rows.", p99_rows= "Gauge metric with percentile 99 of returned rows." } request = "select percentile_disc(0.95) within group (order by rownum) as p95_rows, percentile_disc(0.99) within group (order by rownum) as p99_rows from v$sql where last_active_time >= sysdate - 5/(24*60)" [[metric]] context = "size_user_segments_top100" metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."} labels = ["segment_name"] request = "select * from (select segment_name,sum(bytes) as table_bytes from user_segments where segment_type='TABLE' group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY" [[metric]] context = "size_user_segments_top100" metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."} labels = ["segment_name"] request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from user_segments where segment_type='TABLE PARTITION' group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY" [[metric]] context = "size_user_segments_top100" metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."} labels = ["segment_name"] request = "select * from (select segment_name,sum(bytes) as cluster_bytes from user_segments where segment_type='CLUSTER' group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY" [[metric]] context = "size_dba_segments_top100" metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."} labels = ["segment_name"] request = "select * from (select segment_name,sum(bytes) as table_bytes from dba_segments where segment_type='TABLE' group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY" [[metric]] context = "size_dba_segments_top100" metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."} labels = ["segment_name"] request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from dba_segments where segment_type='TABLE PARTITION' group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY" [[metric]] context = "size_dba_segments_top100" metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."} labels = ["segment_name"] request = "select * from (select segment_name,sum(bytes) as cluster_bytes from dba_segments where segment_type='CLUSTER' group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY"
After creating the Secret and ConfigMap, you can apply the Deployment and check if it is getting metrics from port 9161 of the Oracle database.
If everything goes well, Prometheus will automatically discover the exporter annotated pod and start scraping metrics within a few minutes. This can be inspected in the target section of the Prometheus web interface for any metrics starting with oracledb_.
What to monitor
Performance Indicators
Wait Time: The exporter provides a range of wait time metrics for different activities in the Oracle database. They all start with the oracledb_wait_time_ prefix and they help evaluate where the database is spending more time. It can exist in I/O, network, commit, concurrency, etc. In this way, we can identify bottlenecks in the system that may affect the overall performance of the Oracle database.
Slow queries: Some queries may take longer to return results than others. If this time is higher than the timeout configured in the application for receiving a response, it will consider this a timeout error from the database and retry the query. This behavior can overload the system and affect overall performance.
In the configuration shown above, there are two custom metrics that provide information on the percentage 95 and 99 of the response time for queries executed within the last 5 minutes. These indicators are:
- oracledb_slow_queries_p95_time_usecs
- oracledb_slow_queries_p99_time_usecs
Active Sessions: It is important to monitor active sessions in the Oracle database. If the configured limit is exceeded, the database will reject new connections, causing application errors. The metric that provides this information is oracledb_sessions_value, and the label status can provide more information.
Activity: It is also important to monitor the operations performed by the database. For this we can rely on the following indicators:
- oracledb_activity_execute_count
- oracledb_activity_parse_count_total
- oracledb_activity_user_commits
- oracledb_activity_user_rollbacks