【Manager Handbook for Distributed AntDB-T】Data Collection Tasks and Node Monitoring Tasks-Antdb,Antdb database

English

简体中文

English

Home > About > News > Technical Column

【Manager Handbook for Distributed AntDB-T】Data Collection Tasks and Node Monitoring Tasks

News-2023-08-31

Asiainfo Anhui Technologies

Data collection tasks

After the data collection task is started, the corresponding information will be collected by the agent on the host and stored in the relevant table of adbmgr. Refer to related sheets section for the introduction to the table.

Configure host resource collection task

add job usage_for_host (interval= 60,command = 'select monitor_get_hostinfo();');

Task description: At an interval of 60 seconds, collect host information: including cpu, memory, disk, network and other dimensions.

Configure the database resource collection task

add job usage_for_adb (interval= 60,command = 'select monitor_databaseitem_insert_data();');

Task description: 60 seconds interval, collect database information: including library size, archive information, commit rollback rate, stream replication latency, long transactions, and other information.

Configure database performance index collection task

add job tps_for_adb (interval= 60,command = 'select monitor_databasetps_insert_data();');

Task Description: Collect database TPS and QPS information at 60-second intervals.

Configure database slow SQL monitoring task

add job slowlog_for_adb (interval= 60,command = 'select monitor_slowlog_insert_data();');

Task description: collect slow SQL information at 60-second interval, need to use pg_stat_statement plugin.

Node monitoring tasks

Configure the coordinator monitoring task

add job mon_coord (interval = 5, status = true,command ='select monitor_handle_coordinator()' );

Task Description: Check if there is a failed node in the coordinator at an interval of 5 seconds, if so, retry the connection, and after three failed retries, remove the failed coordinator node from the cluster.

Configure gtmcoord monitoring task

add job mon_gtmcoord (interval = 5,status=true,command='select monitor_handle_gtmcoord()');

Task description: Check whether gtm is failed in 5 seconds interval, if it is, retry to connect, and perform failover gtmcoord operation after three failed retries.

Configure datanode monitoring task

add job mon_datanode (interval = 5,status=true,command='select monitor_handle_datanode()');

Task description: Check if datanode master has failed node in 5 seconds interval, if so, retry connection, after three failed retries, perform failover datanode operation. Each time the job runs, process a failed datanode master.

All the above use default parameters, if you want to modify the default parameter values, add them in the following way

add job mon_datanode (interval = 5,status=true,command='select monitor_handle_datanode('''',true,5,3,20)');

Note:

After adding a node monitoring task, it blocksstop allandstart all operation.

When the status of the monitoring task is true, these two operations are failures and give the prompt.

postgres=# stop all mode fast;
ERROR: on job table, the content of job "mon_coord" includes "monitor_handle_coordinator" string and its status is "on"; you need do "ALTER JOB "mon_coord" (STATUS=false);" to alter its status to "off" or set "adbmonitor=off" in postgresql.conf of ADBMGR to turn all job off which can be made effect by mgr_ctl reload
HINT: try "list job" for more information
postgres=# start all;
ERROR: on job table, the content of job "mon_coord" includes "monitor_handle_coordinator" string and its status is "on"; you need do "ALTER JOB "mon_coord" (STATUS=false);" to alter its status to "off" or set "adbmonitor=off" in postgresql.conf of ADBMGR to turn all job off which can be made effect by mgr_ctl reload
HINT: try "list job" for more information
postgres=#

Workaround: Set the node monitoring task to false temporarily, refer to Suspend Task described below.

Hello！

Tell us what you need.

Consultation

antdb@asiainfo.com

flyingserver@asiainfo.com

AntDB
Carrier-level core transaction database

AntDB has been providing online services for more than 1 billion subscribers in 24provinces across the country on the operator's core system since 2008.

Boasting features such as high performance, flexible expansion and high reliability, AntDB can handle millions of communication core transactions per second at peak.

Besides, it has been successfully commercialized in communications, finance, transportation, energy Internet of Things and other industries.