prometheus+grafana+multiple collector installation configurations

Prometheus + grafana + multiple indicator collectors

In this case prometheus is installed on the host 128.5.80.182

1 Install prometheus main program

1.1 Installation

#Unzip the installation package
tar zxvf prometheus-2.44.0.linux-amd64.tar.gz

#Move the folder to the specified location
cd prometheus-2.44.0.linux-amd64
mv */home/ap/prometheus

#Create startup command to environment variable
ln -s /home/ap/prometheus/prometheus /usr/local/bin/prometheus

##Verify Prometheus installation version
prometheus --version

1.2 Create related directories

mkdir -p /home/ap/prometheus/log ##Storage log directory
mkdir -p /home/ap/prometheus/data ##Store monitoring data directory

1.3 Startup method

##Startup method 1 (not recommended, it will run in the foreground and the window cannot be closed)
prometheus --config.file=/home/ap/prometheus/prometheus.yml --web.enable-lifecycle

##Startup method 2 (start in the background and generate logs into the prometheus.log file)
nohup prometheus --config.file=/home/ap/prometheus/prometheus.yml \
--storage.tsdb.path=/home/ap/prometheus/data --web.enable-lifecycle > /home/ap/prometheus/log/prometheus.log 2> & amp;1 & amp;

##Startup method 3, to start the service, can be controlled through systemctl
Don’t configure it yet

1.4 Access via browser

http://128.5.80.182:9090/metrics

2 node_exporter node exporter

node_exporter can collect indicators of various aspects of the operating system, such as CPU, memory, hard disk, network, IO and other information. The graphic display effect through grafana is as shown below:

2.1 Install node exporter

#Extract and move to the specified location on each node that needs to be monitored
cd /home/ap/Prometheus
tar zxvf node_exporter-1.5.0.linux-amd64.tar.gz
mv node_exporter-1.5.0.linux-amd64 /home/ap/prometheus/node_exporter

2.2 Start node_exporter

#Startup:
nohup /home/ap/prometheus/node_exporter/node_exporter >/dev/null 2> & amp;1 & amp;

2.3 Configure prometheus.yml

Add the following configuration information to scrape_configs: below

scrape_configs:
  - job_name: "node"
    file_sd_configs:
    - files:
      - targets/nodes.yml
      refresh_interval: 2m
    scrape_interval: 15s
    static_configs:
      - targets:

Configure monitoring list

vi /home/ap/prometheus/targets/nodes.yml
- targets:
  - 128.5.80.160:9100
  - 128.5.80.95:9100
  - 128.5.80.96:9100
  - 128.5.80.97:9100
  - 128.1.80.43:9100
  - 128.5.80.182:9100

2.4 Restart the prometheus main program

#Kill the old process
ps -ef |grep Prometheus
kill -9 xxxx
#Start new process
nohup prometheus --config.file=/home/ap/prometheus/prometheus.yml \
--storage.tsdb.path=/home/ap/prometheus/data --web.enable-lifecycle > /home/ap/prometheus/log/prometheus.log 2> & amp;1 & amp;

3 Install oracledb_exporter exporter

Oracledb_exporter can monitor ORACLE database-related indicators, such as table space, session status, parsing status, and waiting status. The graphic display effect through grafana is as shown below:

There are several ways to monitor Oracle database,

1. You can place the exporter on each database server that needs to be monitored.

2. You can install the exporter on the prometheus server. This method can be installed only once. The exporter process is all on the monitoring server side, which will not have any impact on the database server. Put the pressure on the monitoring server side.

This time the installation of the environment is deployed according to the second method.

3.1 Install the exporter

##Unzip
tar zxvf oracledb_exporter.0.3.0rc1-ora18.5.linux-amd64.tar.gz
mv oracledb_exporter.0.3.0rc1-ora18.5.linux-amd64 oracledb_exporter

3.2 Configure environment variables

##Configure environment variables under root to connect to the database from the monitoring terminal
export ORACLE_HOME=/home/db/oracle/product/19.3.0
export PATH=$PATH:/home/db/oracle/product/19.3.0/bin
export LD_LIBRARY_PATH=:/home/db/oracle/product/19.3.0/lib

3.3 Test database connectivity

##Create a unified monitoring account on each database with as few permissions as possible
create user prometheus identified by Abcd_123;
grant create session to prometheus;
grant select_catalog_role to prometheus;
##Test connectivity
sqlplus prometheus/[email protected]:11521/clouddb
sqlplus prometheus/[email protected]:11521/nbutf8db
sqlplus prometheus/[email protected]:1521/odsbptdb
sqlplus prometheus/[email protected]:11522/zyqdb
sqlplus prometheus/[email protected]:1521/jstsptdb
sqlplus prometheus/[email protected]:1522/P8UTF8DB
sqlplus prometheus/[email protected]:11521/jstsptdb
sqlplus prometheus/[email protected]:11521/nbutf8db
sqlplus prometheus/[email protected]:11521/jstsptdb
sqlplus prometheus/[email protected]:11521/nbutf8db
sqlplus prometheus/[email protected]:11521/jstsptdb

3.4 Start the exporter

One exporter per database

#library1

export DATA_SOURCE_NAME=prometheus/[email protected]:11521/clouddb

nohup /root/oracledb_exporter/oracledb_exporter --default.metrics=/root/oracledb_exporter/default-metrics.toml --web.listen-address :9161 >/dev/null 2> & amp;1 & amp;

#Check whether monitoring data can be obtained

curl http://128.5.80.182:9161/metrics

#library2

export DATA_SOURCE_NAME=prometheus/[email protected]:11521/nbutf8db

nohup /root/oracledb_exporter/oracledb_exporter --default.metrics=/root/oracledb_exporter/default-metrics.toml --web.listen-address :9162 >/dev/null 2> & amp;1 & amp;

#Check whether monitoring data can be obtained

curl http://128.5.80.182:9162/metrics

#Library3.

export DATA_SOURCE_NAME=prometheus/[email protected]:1521/odsbptdb

nohup /root/oracledb_exporter/oracledb_exporter --default.metrics=/root/oracledb_exporter/default-metrics.toml --web.listen-address :9163 >/dev/null 2> & amp;1 & amp;

#Check whether monitoring data can be obtained

curl http://128.5.80.182:9163/metrics

Other libraries can be deduced by analogy

3.5 Configure prometheus.yml

Add the following configuration information to scrape_configs: below

 - job_name: "oracle"
    file_sd_configs:
    - files:
      - targets/db.yml
      refresh_interval: 2m
    scrape_interval: 5m
    static_configs:
      - targets:

Configure monitoring list:

vi /home/ap/prometheus/targets/db.yml
- targets:
  - 128.5.80.182:9161
  labels:
    dbname: '80.182-clouddb'
- targets:
  - 128.5.80.182:9162
  labels:
    dbname: '80.97-nbutf8db'
- targets:
  - 128.5.80.182:9163
  labels:
    dbname: '80.160-odsbptdb'
- targets:
  - 128.5.80.182:9164
  labels:
    dbname: '80.160-zyqdb'
- targets:
  - 128.5.80.182:9165
  labels:
    dbname: '80.43-jstsptdb'
- targets:
  - 128.5.80.182:9166
  labels:
    dbname: '80.43-P8UTF8DB'
- targets:
  - 128.5.80.182:9167
  labels:
    dbname: '80.95-jstsptdb'
- targets:
  - 128.5.80.182:9168
  labels:
    dbname: '80.95-nbutf8db'
- targets:
  - 128.5.80.182:9169
  labels:
    dbname: '80.96-jstsptdb'
- targets:
  - 128.5.80.182:9170
  labels:
    dbname: '80.96-nbutf8db'
- targets:
  - 128.5.80.182:9171
  labels:
    dbname: '80.97-jstsptdb'
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

3.6 Restart the prometheus main program

#Kill the old process
ps -ef |grep Prometheus
kill -9 xxxx

#Restart new process
nohup prometheus --config.file=/home/ap/prometheus/prometheus.yml \
--storage.tsdb.path=/home/ap/prometheus/data --web.enable-lifecycle > /home/ap/prometheus/log/prometheus.log 2> & amp;1 & amp;

4 Install mysqld_exporter

mysqld_exporter can monitor mysql database related indicators, such as connection status, table lock status, etc. The graphic display effect through grafana is as shown below:

4.1 Install the mysqld_exporter exporter

##Unzip
cd /home/ap/prometheus
tar -zxvf mysqld_exporter-0.14.0.linux-amd64.tar.gz
mv mysqld_exporter-0.14.0.linux-amd64 mysqld_exporter

4.2 Create monitoring user

create user 'exporter'@'localhost' identified by 'Exporter_123';
grant process,replication client,select on *.* to 'exporter'@'localhost';

4.3 Add configuration file

vi /home/ap/prometheus/mysqld_exporter.cnf
[client]
host=127.0.0.1
port=3306
user=exporter
password=Exporter_123

4.4 Starting the exporter

nohup /home/ap/prometheus/mysqld_exporter/mysqld_exporter --config.my-cnf=/home/ap/prometheus/mysqld_exporter/mysqld_exporter.cnf 2> & amp;1 & amp;

 Check if data is collected
curl http://128.5.80.182:9104/metrics

4.5 Configure prometheus.yml

Add the following configuration information to scrape_configs: below

 - job_name: "mysql"
    file_sd_configs:
    - files:
      - targets/mysql.yml
      refresh_interval: 2m
    scrape_interval: 2m
    static_configs:
      - targets:

Configure monitoring list

vi /home/ap/prometheus/targets/mysql.yml
- targets:
  - 128.5.80.182:9104

4.6 Restart the prometheus main program

##Kill the old process
ps -ef |grep Prometheus
kill -9 xxxx

##Restart new process
nohup prometheus --config.file=/home/ap/prometheus/prometheus.yml \
--storage.tsdb.path=/home/ap/prometheus/data --web.enable-lifecycle > /home/ap/prometheus/log/prometheus.log 2> & amp;1 & amp;

5 Install wmware_exporter

Vmware_exporter can monitor virtual machine usage related indicators, and the monitoring effect is displayed through grafana as shown in the figure below:

5.1 Install docker environment

docker installation brief

5.2 Import wmware_exporter image

#Image file location
/home/ap/prometheus/vmware_exporter/vmware_exporter.tar.gz

#Import image
docker load -i vmware_exporter.tar.gz

#Confirm the import is successful
docker images

5.3 Edit configuration file

vi /home/ap/prometheus/vmware_exporter/config.env
[email protected]
VSPHERE_PASSWORD=Jsccb@123
VSPHERE_HOST=128.5.80.175
VSPHERE_IGNORE_SSL=TRUE
VSPHERE_SPECS_SIZE=2000

5.4 Start container

docker run -itd -p 9272:9272 --name vmware_exporter --env-file /home/ap/prometheus/vmware_exporter/config.env pryorda/vmware_exporter

Verify whether data can be collected
curl http://localhost:9272/metrics
http://128.5.80.182:9272/metrics

5.5 Configure prometheus.yml

Main configuration file, add the following content

 - job_name: "vmware_vcenter"
    file_sd_configs:
    - files:
      - targets/vmware_vcenter.yml
      refresh_interval: 2m
    scrape_interval: 2m
    static_configs:
      - targets:

Configure monitoring list

vi /home/ap/prometheus/targets/mysql.yml
- targets:
  - 128.5.80.182:9104

5.6 Restart the prometheus main program

##Kill the old process
ps -ef |grep Prometheus
kill -9 xxxx

##Restart new process
nohup prometheus --config.file=/home/ap/prometheus/prometheus.yml \
--storage.tsdb.path=/home/ap/prometheus/data --web.enable-lifecycle > /home/ap/prometheus/log/prometheus.log 2> & amp;1 & amp;

6 Install ipmi_exporter

Ipmi_exporter can monitor the status of various sensors in the physical chassis, such as fan sensors, temperature sensors, storage sensors, etc. The graphic display effect through grafana is as shown below:

6.1 Installation

#Need to install freeipmi
yum install freeipmi

#decompression
cd /home/ap/prometheus
tar -zxvf ipmi_exporter-1.6.1.linux-amd64.tar.gz
mv ipmi_exporter-1.6.1.linux-amd64 ipmi_exporter

6.2 Edit ipmi configuration file

vi /home/ap/prometheus/ipmi_exporter/ipmi_remote.yml

modules:
  default:
    user: "Administrator"
    pass: "Fence12#$"
    driver: "LAN_2_0"
    privilege: "user"
    timeout: 10000
    collectors:
    - bmc
    -ipmi
    - chassis
    exclude_sensor_ids:
    - 2
    - 29
    - 32
    - 50
    - 52
    - 55
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

6.3 Start ipmi_exporter exporter

cd /home/ap/prometheus/ipmi_exporter

./ipmi_exporter --config.file=/home/ap/prometheus/ipmi_exporter/ipmi_remote.yml &

#test

http://128.5.80.182:9290

#Test whether the income iLO address can capture data

iLO address: 128.5.80.147 128.5.80.148

6.4 Configure the prometheus.yml main configuration file

Add the following

 - job_name: "ipmi"
    params:
      module: ['default']
    scrape_interval: 1m
    scrape_timeout: 30s
    metrics_path: /ipmi
    scheme: http
    file_sd_configs:
    - files:
      - targets/ipmi.yml
      refresh_interval: 2m
    relabel_configs:
    - source_labels: [__address__]
      separator: ;
      regex: (.*)
      target_label: __param_target
      replacement: ${1}
      action:replace
    - source_labels: [__param_target]
      separator: ;
      regex: (.*)
      target_label: instance
      replacement: ${1}
      action:replace
    - separator: ;
      regex: .*
      target_label: __address__
      replacement: 128.5.80.182:9290
      action:replace
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

#Add monitoring point

vi /home/ap/prometheus/targets/ipmi.yml
- targets:
  - 128.5.80.148
  - 128.5.80.147
  - 128.5.80.149
  - 128.5.80.150
  - 128.5.80.222
  - 128.5.80.223
  - 128.5.80.141
  - 128.5.80.168
  - 128.5.80.139
  - 128.5.80.140
  - 128.5.80.240
  - 128.5.80.241
  - 128.5.80.242
  labels:
    job: ipmi_exporter
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

6.5 Restart prometheus

##Kill the old process
ps -ef |grep Prometheus
kill -9 xxxx

##Restart new process
nohup prometheus --config.file=/home/ap/prometheus/prometheus.yml \
--storage.tsdb.path=/home/ap/prometheus/data --web.enable-lifecycle > /home/ap/prometheus/log/prometheus.log 2> & amp;1 & amp;

7 Install grafana graphics software

The grafana graphics software cooperates with prometheus to display the data obtained by prometheus graphically to facilitate monitoring.

7.1 Install rpm package

rpm -ivh grafana-enterprise-9.4.10-1.x86_64.rpm

7.2 Execute the given service startup script

/bin/systemctl daemon-reload
/bin/systemctl enable grafana-server.service
/bin/systemctl start grafana-server.service

7.3 Access via browser

http://128.5.80.182:3000

The default password is admin/admin

7.4 Import monitoring display board template

Monitoring templates can be downloaded from the official website of grafana

The import method is as follows:

Here you can select the exhibition board template downloaded from the official website and import it.

8 Configure alarm rules

8.1 Create rule directory

cd /home/ap/prometheus
mkdir rules

8.2 Add directories to the prometheus.yml main configuration file

vi prometheus.yml
rule_files:
  - "rules/*.rules"

8.3 Create alarm rules

vi /home/ap/prometheus/rules/alerts.rules
groups:
  - name: disk_alerts
    rules:
      - alert: "Disk Alert"
        expr: (1-node_filesystem_avail_bytes{<!-- -->mountpoint=~".*"}/node_filesystem_size_bytes{<!-- -->mountpoint=~".*"})*100>90
        for: 1m
        labels:
          severity: "Severe Warning"
        annotations:
          summary: "Disk partition usage alarm"
          description: "Disk usage exceeds 90%"
      
  - name: tablespaces_alerts
    rules:
      - alert: "Table space usage alarm"
        expr: (1-oracledb_tablespace_free{<!-- -->type!="TEMPORARY"}/oracledb_tablespace_bytes{<!-- -->type!="TEMPORARY"})*100>90
        for: 5m
        labels:
          severity: "Severe Warning"
        annotations:
          summary: "Table space remaining space alarm"
          description: "Table space usage exceeds 90%"
 
  - name: Memory_alerts
    rules:
      - alert: "Memory Alert"
        expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes)))* 100>80
        for: 1m
        labels:
          severity: "minor warning"
       annotations:
         summary: "Memory usage warning"
         description: "Memory usage exceeds 80%"
 
  - name: cpu_alerts
    rules:
      - alert: "cpu alert"
        expr: 100-avg(irate(node_cpu_seconds_total{<!-- -->mode="idle"}[5m])) by (instance)*100>90
        for: 1m
        labels:
          severity: "Severe Warning"
        annotations:
          summary: "cpu usage alarm"
          description: "CPU usage exceeds 90% for 1 minute continuously"

  - name: sensors_alerts
    rules:
      - alert: "Sensor Alert"
        expr: ipmi_sensor_state > 0
        for: 1m
        labels:
          severity: "Sensor Alarm"
        annotations:
          summary: "Sensor Alarm"
          description: "Sensor Alarm"
 
  - name: fan_alerts
    rules:
      - alert: "Fan speed sensor alarm"
        expr: ipmi_fan_speed_state > 0
        for: 1m
        labels:
          severity: "Fan speed sensor alarm"
        annotations:
          summary: "Fan speed sensor alarm"
          description: "Fan speed sensor alarm"

  - name: power_alerts
    rules:
      - alert: "Power sensor alarm"
        expr: ipmi_power_state > 0
        for: 1m
        labels:
          severity: "Power sensor alarm"
        annotations:
          summary: "Power sensor alarm"
          description: "Power sensor alarm"

  - name: temperature_alerts
    rules:
      - alert: "Temperature sensor alarm"
        expr: ipmi_temperature_state > 0
        for: 1m
        labels:
          severity: "Temperature sensor alarm"
        annotations:
          summary: "Temperature sensor alarm"
          description: "Temperature sensor alarm"

 

  - name: voltage_alerts
    rules:
      - alert: "Voltage sensor alarm"
        expr: ipmi_voltage_state > 0
        for: 1m
        labels:
          severity: "Voltage sensor alarm"
        annotations:
          summary: "Voltage sensor alarm"
          description: "Voltage sensor alarm"
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

8.4 Restart prometheus

##Kill the old process
ps -ef |grep Prometheus
kill -9 xxxx

##Restart new process
nohup prometheus --config.file=/home/ap/prometheus/prometheus.yml \
--storage.tsdb.path=/home/ap/prometheus/data --web.enable-lifecycle > /home/ap/prometheus/log/prometheus.log 2> & amp;1 & amp;

8.5 View alarm information

Here you can see whether the created alarm rule has been triggered.

8.6 Display alarm data in grafana

The display effect is as shown below. Click on the number to view detailed alarm information.

For example, click on number 8 to see the specific alarm status of the table space as shown in the figure below

The steps to achieve the above effects are as follows:

8.6.1 Create a new alarm detailed information panel

8.6.2 Edit panel

Enter the query expression at mark 1: ALERTS{alertname=”Table space usage alarm”}

Note: The name within double quotes is the name of the alarm rule created earlier.

Select table at mark 2

Select instance at logo 3

Select table at mark 4

As shown in the red circle in the figure below, filter the items to be displayed, and turn off the previous options if you do not need to display the columns.

8.6.3 Save panel

After saving, you will get detailed table space disk warning information, as shown below

8.6.4 Get panel link information

The link information for obtaining this alarm information panel is as follows:

Copy this Link URL, you will use it later

8.6.5 Create a new alarm panel

8.6.6 Edit panel

Fill in the code sum(ALERTS{alertname=”Table space usage alarm”}) in the circle

Note: The name within double quotes is the name of the alarm rule created earlier.

Adjust parameters, select table for format, select instance for type, select stat for panel, and change the default 80 to 1 in thresholds.

Add link information

8.6.7 Save panel

After saving, you can achieve the effect shown at the beginning, and other monitoring indicators can be built in this way.