Asynchronous disaster recovery: AntDB's business uninterrupted data recovery solution
【Abstract】When the business has performed a misoperation on the database, resulting in the loss of data or other problems. At this time, although the database can be recovered by backup recovery or data flashback, the process may lead to a longer interruption of the business and eventually affect the stability of the business system. In this scenario, AntDB provides a disaster recovery solution of delayed replication, which this paper explores, for fast business recovery.
Keywords: synchronous replication; asynchronous replication; delayed replication; database parameters
1.Overview on Database Disaster Recovery
In the case of data mistakenly deleted by the business, the disaster recovery solution of delayed replication provided by AntDB can be used to quickly recover the mistakenly deleted data and ensure the stable operation of the business system.
2.Principle of AntDB Deferred Replication
This chapter introduces the basic principles of replication and delayed replication and the applicable scenarios of AntDB delayed replication.
Basic Principle of Replication
Database running online usually adopts high-availability employment to ensure stable operation of it. AntDB supports the following two modes for high availability deployment:
Synchronous Replication High Availability: A transaction is considered properly executed by the AntDB only after all the changes of a transaction have been transferred to one or more synchronous backup servers.
Figure 1: Principle of AntDB Synchronous Replication
Asynchronous replication high availability: After a transaction is executed successfully on the master database (regardless of whether the backup server receives information about the transaction), the AntDB considers the transaction to be executed normally.
Figure 2: Principle of AntDB Asynchronous Replication
2.2 Basic Principle of Delayed Replication
By default, after the asynchronous replication of backup database has acquired the datafiles, it immediately plays back the received datafiles; it ensures that the transactions that have been committed on the master database can be committed on the backup database as well, thus ensuring the consistency of the parimary and backup data. The function of delayed replication is a function supported on the basis of asynchronous replication; you can control when the data files are played back in the backup database by configuring the corresponding parameters.
Introducing delayed backup database using the stock market T+1 business as an example:
T+1 in the stock market business: when the transaction ends at the T time point, the funds need to be delayed by 1 trading day before they arrive (assuming the next trading day is a business day).
T+1 in delayed backup database:when the transaction ends at the T time point, the backup database requires a delay of N time before the data file is played back.
# master database: T as committed time of transaction
# delayed backup database: T+N as committed time of transaction
The specific AntDB processing flow can be found in the following diagram:
Figure 3: Principle of AntDB Delayed Replication
2.3 Applicable Scenarios of AntDB Delayed Replication
The above description gives the following scenarios for the application of delayed replication:
A complementary solution for mega database backup recovery, which can effectively reduce costs.
If the master database misoperates at T time point (only refers to DML operation, not valid for DDL such as Truncate, Drop, etc.), the data recovery within T time point can be completed within N time period, which is because the backup database delays until T+N time point to start WAL log flow redo.
3 Example of AntDB Delayed Replication Disaster Recovery Solution
This section introduces environment information, setting up master and backup replication, verifying master and backup data consistency, setting up delayed replication, and verifying delayed replication.
3.1 Environmental Information
Deploy AntDB high availability environment in the following environment:
3.2 Setting up Primary and Backup Replication
Deploy AntDB high availability environment in the following environment:
Setting up the backup database on 10.21.13.208:
Check replication status:
Figure 4: 10.21.13.207 Check Results of Replication Status
Figure 5: 10.21.13.208 Check Results of Replication Status
Conclusion: The AntDB master-standby replication relationship has been built properly.
3.3 Master and backup data consistency verification
10.21.13.207:
Figure 6: Query of 10.21.13.207 data
Figure 7: Query of 10.21.13.208 data
Conclusion: The master and backup data remain consistent.
3.4 Delayed Replication Settings
The delayed replication is set on 10.21.13.208 by setting the AntDB parameter recovery_min_apply_delay.
Figure 8: Setting delayed replication parameters on 10.21.13.208
Conclusion: The AntDB delayed replication parameters have been set to take effect.
3.5 Verification of Delayed Replication
3.5.1 Validation of DML
Verification of DML on 10.21.13.207:
Figure 9: Verification of DML on 10.21.13.207
Verification of DML on 10.21.13.208:
Figure 10 Verification of DML on 10.21.13.208
Conclusion: The SQL execution on the master database is completed instantly; the data log playback of the backup database, which normally follows the delayed replication parameters (1min).
3.5.2 Recovery in case of DML misoperation
Erased data by mistake on 10.21.13.207.
Figure 11: Erased data on 10.21.13.207
Acquisition of erased data on 10.21.13.208.
Figure 12: Acquisition of erased data on 10.21.13.208.
Recover erased data on 10.21.13.207 and confirm:
Figure 13: Recovery of mistakenly erased data on 10.21.13.207
Confirm the recovered data on 10.21.13.208:
Figure 14: Verification of mistakenly erased data on 10.21.13.208.
Conclusion: After a DML misoperation on the master database, the misoperated data can be exported from the backup database within 1min. The exported data can be recovered on the primary library. The recovered data remains consistent.
3.5.3 Verification of Truncate
10.21.13.207:
Figure 15:Truncate on 10.21.13.207
10.21.13.208:
Figure 16: Query waiting on 10.21.13.208
Conclusion: The truncate on the master database is executed immediately; when the backup database detects the truncate, it waits for the current select, and the query waiting time is 1min.
3.5.4 Verification of drop table
10.21.13.207:
Figure 17:drop on 10.21.13.207
10.21.13.208:
Figure 18: Query waiting on 10.21.13.208
Conclusion: The drop on the master database is executed immediately; when the backup database detects the drop, it waits for the current select, and the query waiting time is 1min.
4 Exploration Summary
In summary, with the mechanism of delayed replication of AntDB, when the business has mistakenly deleted data from the database, the data from the delayed backup database can be exported to restore the data of the master database. AntDB can realize the rapid recovery of business and guarantee the stable operation of the core system, thus strongly supporting the stability and continuity of business and improving the experience of end users.
About Us
AntDB was founded in 2008. On the core system of operators, AntDB provides online services for more than 1 billion users in 24 provinces across the country.It is widely used in communication, finance, transportation, energy, Internet of Things and other industries, and successfully applied to more than 200 projects. With product features such as high performance, elastic expansion and high reliability, AntDB can process one million core communications transactions per second at its peak, ensuring the continuous and stable operation of the system for nearly ten years with 0 failure.