AntDB was established in 2008. On the core system of operators, AntDB provides online services for more than 1 billion users in 24 provinces across the country. With product features such as high performance, elastic expansion and high reliability, AntDB can process one million core communications transactions per second at peak, ensuring the continuous and stable operation of the system for nearly ten years, and is successfully implemented for commercial purpose in communication, finance, transportation, energy, Internet of Things and other industries. Lock is an important means to ensure transaction consistency in OLTP database. This article focuses on the lock-related design of AntDB-M (AntDB in-memory engine).
Overview
The lock design of AntDB-M is divided into two layers: 1) metadata locks and 2) data locks. Before acquiring a data lock, a metadata lock must be acquired, thus protecting the consistency of the metadata with the underlying data.
Metadata locks
Metadata locks represent the ability of a connection to access metadata of a table. AntDB-M has designed various metadata lock types according to the different requirements of metadata and data for operation statements, to meet the different read and write restrictions and concurrency capability on metadata and data. Since exclusive locks have higher priority and lower concurrency, therefore, to avoid blocking a large number of other types of lock requests, after the exclusive lock acquires a certain volume of data, it will give priority to authorize other locks.
Data lock
A data lock represents the ability to access a data object (table, record). There are two types of capabilities: 1) read; 2) write (read, delete, update). According to the access characteristics of data object, different access restrictions are given to the data object by lock type, lock granularity, lock range, etc.
The later section focuses on the design related to data locks.
Data lock
Lock type
l Shared lock
A data object (table, record) that can be locked multiple times at the same moment. It represents the ability to read. It increases the concurrency of data access.
l Exclusive lock
A data object (table, record) can be locked only once at the same time. It represents the ability to write. It ensures data integrity and consistency.
Lock granularity
The data object (table, record) that identifies the lock, is divided into: table lock, row lock.
Intent lock
Intentional lock is a type of table lock, which indicates that there is a "possible" row lock in the table. Intentional locks are divided into Intentional Shared Locks (IS) and Intentional Exclusive Locks (IX). Intentional locking is actually a type of lock set to improve the efficiency of table locks (table shared locks and table exclusive locks). It is intended to avoid determining whether there is a row lock or not when requesting a table lock.
Lock compatibility
AntDB-M has two levels of locks: table and record. When applying locks, you must first apply for the table level, and then apply for the record level (when necessary). Lock compatibility refers to the permission to apply for the same type or different types of locks on the same data object (table, record). If it is compatible, the application is allowed, but if not, the application is not allowed. Lock types: Intentional Shared Lock (IS), Intentional Exclusive Lock (IX), Table Shared Lock (TS), Table Exclusive Lock (TX), Record Shared Lock (RS), Record Exclusive Lock (RX);
The lock compatibility is as follows.
Vertical axis: existing locks; Horizontal axis: locks to be applied.
"Y": compatible; "-": incompatible;
Table level
Type | IS | IX | TS | TX |
IS | Y | Y | Y | - |
IX | Y | Y | - | - |
TS | Y | - | Y | - |
TX | - | - | - | - |
Table 1: table-level lock
Record level
Table 2: record-level lock
Locks and latches
The object of a lock is the data object of a table (table, record). There are two types of locks on records: row locks and latch locks. The row lock is used for read/write control of rows in non-MVCC mode, and the latch lock is used for read/write control of records in MVCC mode.
Lock design
Each lock has a transaction to which it belongs, or the applicant of the lock is the transaction. Each transaction has its own, pending request for a chain table of table lock, row lock.
table lock
Each table has a table lock information, including: table lock chain table, the number of locks already added, and the number of locks (including pending locks). Before each transaction can access the table, it needs to apply for a lock on the table. All applications are queued in order of priority.
Lock verification
Locking a table is not allowed if the table has been deleted, or the table has been renamed when a transaction applies for a lock.
Table lock request
Before access a table, a suitable table lock for the table is requested first.
l IS
select;
l IX
insert,update,delete; select ... for update;
l TS
lock tables ... read;
l TX
lock tables ... write; alter table,drop table,truncate,rename;
Table lock escalation
A transaction, for a table should hold only one type of table lock. Here it needs to be explained from several aspects, firstly, from the lock compatibility, for incompatible locks, they are not allowed to be held at the same time. This type of situation must be selected out of one. Secondly, in terms of lock binding, exclusive locks are more binding than shared locks (TX>TS, IX>IS), and table locks are more binding than table intentional locks. Thirdly, repeated requests for the same type of lock are not necessary. Combined with the previous lock compatibility, as shown in Table 3: lock compatibility: '-' indicates incompatible ones, and 'X' indicates the same level without repeated requests. The only four cases left are A,B,C,D. These four cases are viewed through the binding force of the lock, all of which have different binding forces, and ultimately the higher binding lock prevails.
Type | IS | IX | TS | TX |
IS | X | A | B | - |
IX | C | X | - | - |
TS | D | - | X | - |
TX | - | - | - | - |
Table 3: lock compatibility
In summary, only one type of table lock is required for a transaction. For transactions already holding a certain type of table lock, applying for a lock again is processed as lock escalation. Lock escalation includes two types: 1) escalation for applied locks, i.e., change the lock already held, and the locking is successful only when the change is successful; 2) escalation for newly applied locks, without changing the lock already held, and the locking is successful immediately.
Type | IS | IX | TS | TX |
IS | - | - | - | - |
IX | IX | - | TX | - |
TS | TS | TX | - | - |
TX | TX | TX | TX | - |
Table 4: rule of table lock escalation
Horizontal axis: lock already held; vertical axis: lock pending application
-: no upgrade required; other: upgrade target
Locking Waiting
Whether it is a lock applied for the first time or an upgrade of an existing lock, it is necessary to determine whether the table lock already applied for by other transactions is compatible with the currently applied table lock.
l No locks on the current table
Lock is directly added successfully.
l Compatible
If all other locks held by transactions on the table are compatible with the currently requested lock, then the locking is successful.
l Not compatible
If all other locks held by transactions on the table are not compatible with the currently requested lock, the current transaction needs to block and wait for the transaction holding the incompatible lock to unlock the table.
Note: if the table is deleted or renamed during the locking waiting process, the locking is still considered to have failed.
Waiting timeout
Each table lock type will have a corresponding lock count. Blocking wait must wait until all incompatible locks are unlocked. The wait timeout is configurable, the default is 50ms, the maximum is 600ms, and the locking fails if the timeout is exceeded. For DDL operations, the timeout is longer, default is 1800ms, configurable, maximum 7200ms.
Impact between different transactions
There is concurrency between different transactions, and there are two factors that affect each other from the perspective of table locks: 1) lock incompatibility, waiting for the release of locks holding locks; 2) concurrent processing, waiting for critical resources of table locks; where factor 2 is a secondary cause, which has little impact and can be ignored. The main factor that causes lock waiting is lock incompatibility. If it is incompatible with a lock already on the table, the transaction must wait for the other transaction to release the incompatible lock held on it. If it is compatible, it does not need to wait. The blocking of a transaction is only related to lock incompatibility, not the order in which transactions are created or the order in which locks are requested.
Lock assignment order
At the same moment, there may be multiple blocked table lock requests on a table, and these blocked table locks may or may not be of the same type. All table locks on a table, whether they hold a lock or not, are queued in the order in which they are requested. But that order does not affect whether a lock is blocked or not. The success of locking is only related to the compatibility with locks already held, not to the order of application.
Row lock
There are only two types of row locks 1) shared lock RS; 2) mutually exclusive lock RX. Multiple shared locks can be added to a row record, but there can only be one mutually exclusive lock.
Transactions and row locks
Each transaction has its own lock chain table that records the row locks held by that transaction. The owner of a row lock is the transaction.
Row Lock Chain Table
Each row of a table record has its own chain table of locks, recording the lock objects that are already held or are waiting. The chain table is queued in the order of locking requests.
Grant conditions
For the granting of a lock, the general principles:
l The principle of mutual exclusivity, where only one RX lock can be granted and no other RS locks are available;
l The sharing principle, where multiple RS locks can be granted and there are no other RX locks;
l The sequential principle, in which locks are granted from front to back along the chain table of row lock; no disorderly order, otherwise data inconsistency will occur due to system implementation.
Based on the above principles, the lock granting conditions are as follows, and those that do not meet the conditions must wait.
l The only current lock object.
The only current lock object, which meets all the principles, can be granted immediately.
l To apply for an RX lock, it must be located at the head of the chain table and there is no other lock.
RX is mutually exclusive, and no other lock can exist except for the RX lock to be applied. If it is not located at the head of the chain table, it will violate the sequential principle. The sequential principle is also violated if there are other locks.
l To request a RS lock, the previous lock objects of the chain table of the lock must have all been locked and there is no RX lock.
The previous lock objects have been locked, and on the premise of the sequential principle, the specific implementation is that after the RS locks is obtained, notify the later lock objects of the lock chain table of obtaining the lock. If there is no RX lock, ensure the principle of mutual exclusivity of RX.
Create a new row lock
When a transaction first locks a row of records, or when a lock is requested again after it is unlocked, a new row lock object needs to be created for that transaction and added to the end of the lock chain table of row locks. It is also added to the transaction's row lock list. The newly created row lock object is immediately judged whether it is ready for being granted locks, and those that do not meet the conditions must wait until the conditions are met.
Row lock escalation
Since RX locks are exclusive, they are considered to be of higher rank than RS locks, i.e., more binding. A transaction is allowed to hold only one lock type for a row to avoid an excessive number of locks, and it is not necessary because RX locks also have the read property. When a transaction already holds a shared lock for a row, whether the lock needs to be upgraded is determined based on the lock holding and requested lock levels. There are three cases: 1) no upgrade at the same level; 2) upgrade for the requested lock; and 3) upgrade for an existing lock;
l No upgrade at the same level
When the requested lock level is the same as the lock already held, either applying for RS or RX locks, there is no need to reapply for a new lock, just use the lock already held. In this case, there is no need to wait and the lock is immediately added successfully.
l Upgrade for the requested lock
When the requested lock level is lower than the lock already held, i.e., when the RX lock is already held and the RS lock is requested, there is no need to apply for a new lock, just use the lock already held, i.e., the requested lock is upgraded to the lock already held that can be directly used. In this case, there is no need to wait and the lock is immediately added successfully.
l Upgrade for an existing lock
When the requested lock level is higher than the lock already held, i.e., when the RS lock is already held and the RX lock is requested, the existing RS lock needs to be upgraded. The upgrade process is as follows:
1. Request a lock object
Request an RX lock object for the row;
2. Determine whether the immediate upgrade can be done
Immediate upgrade conditions: 1) the lock already held is at the head of the chain table of the row lock; 2) no other transactions hold RS locks. If the conditions are met, the lock type of the lock already held is directly adjusted to RX lock. Also release the RX lock object just requested.
3. Add to the chain table of row lock
If it cannot be upgraded immediately, add the RX lock object to the chain table of row lock. The adding method is different from adding a new row lock to the end of the chain table of row lock. The upgraded lock is added to the back of the last lock in the chain table of row lock that already holds a lock. The purpose of this is to upgrade locks without having to wait for other lock objects that have not been granted locks. This violates the order principle, especially if there are RXs in these ungranted locks. Row locks are added to the transaction's row lock chain table at the same time.
4. Waiting to be granted
After the upgraded lock is added to the chain table, it waits for the release of all the previous locks. Once notified, this means that all previous locks have been released. At this point, the locks already held are removed from the chain table of row lock and the locks are granted to the newly added RX lock object. At the same time, the row lock for that row of the transaction will be updated to that row lock.
Critical resources
The critical resources of row locks are necessary to ensure the exclusivity of row locks and the management of the chain table of row lock. Each row record must have a corresponding critical resource. When AntDB-M creates a new data block space for a table, it allocates critical resources for each row of records, even though the rows of records in the data space are not actually used yet.
1. Space issue
That has some suspicion of resource waste, but the critical resource overhead is very small, 10 million rows of records is only 40M memory overhead, and many blocks of the table data block space are not opened at once. Only when there is no table space to accommodate data will it be newly opened. So it is acceptable from the perspective of space resources.
2. Competition issue
If a single critical resource is shared for multiple rows, it also introduces a new competition issue, where the same row is not locked, but a competition is created. There may be more competition for hot data, and CPU resources are more valuable compared to memory resources.
3. Invalid wakeup
For example, if two rows R1,R2 share a critical resource. Under normal circumstances, on a chain table of row lock, the former releases and wakes up the latter. But when two rows share a critical resource, and after R1 releases the lock, its chain table of row lock may already be empty, which is meaningless to bring a problem of invalid wake-up of the chain table of row lock. It must be handled with care.
4. False wakeup
The critical resource must handle the false wakeup problem, and after being woken up, it must detect whether the current lock grant condition is satisfied.
Latches
To improve the performance of MVCC for concurrent reads and writes of data, no shared lock RS, or exclusive lock RX is added for reading and writing of records. Instead, it is controlled by latching. A latch is essentially a read/write lock.
Difference between locks and latches
l No upgrade
Latches do not need lock upgrades.
l No need for chain tables
Since latches do not involve upgrades, they only need to be counted and do not need to distinguish between different latch objects.
l Shorter locking time
The locking time is shorter for mutually exclusive locks compared to row locks.
Conclusion
This article focuses on the types, granularity, and compatibility of AntDB-M's data locks. The specific design of locks is further divided into table locks, row locks, and latch locks, as well as the application, release and upgrade process of different locks, and the difference between different locks. Due to the limitation of space, deadlock detection, phantom reads and dirty reads related to locks are not covered.