Host/Standby Switchover
In the latest version of AntDB, high availability is done automatically by the self-healing module, so users don't need to pay attention to it. So the content of this section is only for reference when users perform it manually.
There are 3 commands involved in host/standby switchover, namely failover, rewind and switchover, which are applied to different scenarios.
Update standby to host and remove host
When the host node is down, the command failover is used to switch the nodes of the host and standby of datanode, coordinator and gtmcoord nodes.
postgres=# failover datanode dn1_1;
WARNING: An exception occurred during the switching operation, It is recommended to use command such as 'monitor all', 'monitor ha' to check the failure point in the cluster first, and then retry the switching operation!!!
ERROR: Can't find a Synchronous standby node, Abort switching to avoid data loss
postgres=# failover datanode dn1_1 force;
NOTICE: dn1_3 have the best wal lsn, choose it as a candidate for promotion
NOTICE: gcn1 try lock cluster successfully
NOTICE: dn1_3 was successfully promoted to the new master
NOTICE: dn1_3 running on correct status of master mode
NOTICE: set GTM information on dn1_2 successfully
NOTICE: dn1_2 running on correct status of slave mode
NOTICE: dn1_1 is waiting for rewinding. If the doctor is enabled, the doctor will automatically rewind it
NOTICE: gcn1 try unlock cluster successfully
NOTICE: Switch the datanode master from dn1_1 to dn1_3 has been successfully completed
nodename | status | description
----------+--------+-------------------
dn1_3 | t | promotion success
(1 row)
As we can see from the operation, if the host node has no synchronous slave, the failover operation is not allowed, otherwise it will cause data loss, and you need to use the optionforce to force the switchover.
Host/Standby Switchover
When the user needs to switch the original host to the standby and the original standby to the host, the switchover command can be used.
postgres=# switchover datanode slave dn2_2;
NOTICE: wait max 10 seconds to wait there are no active connections on coordinators and master nodes
NOTICE: gcn1 try lock cluster successfully
NOTICE: wait max 10 seconds to check there are no active locks in pg_locks except the locks on pg_locks table
NOTICE: wait max 10 seconds to check dn2_1 and dn2_2 have the same xlog position
NOTICE: dn2_2 was successfully promoted to the new master
NOTICE: dn2_2 running on correct status of master mode
NOTICE: dn2_1 running on correct status of slave mode
NOTICE: gcn1 try unlock cluster successfully
NOTICE: Switch the datanode master from dn2_1 to dn2_2 has been successfully completed
nodename | status | description
----------+--------+-------------------
dn2_2 | t | promotion success
(1 row)
postgres=# list node dn2_2;
name | host | type | mastername | port | sync_state | path | initialized | incluster | readonly | zone
-------+-------+-----------------+------------+-------+------------+----------------------------------+-------------+-----------+----------+-------
dn2_2 | adb01 | datanode master | | 52542 | | /data/antdb/data/adb50/d2/dn2_2 | t | t | f | local
(1 row)
postgres=# list node dn2_1;
name | host | type | mastername | port | sync_state | path | initialized | incluster | readonly | zone
-------+-------+----------------+------------+-------+------------+----------------------------------+-------------+-----------+----------+-------
dn2_1 | adb01 | datanode slave | dn2_2 | 52541 | sync | /data/antdb/data/adb50/d1/dn2_1 | t | t | f | local
(1 row)
As you can see, the switchover occurs between the host and the standby.