AntDB was established in 2008. On the core system of operators, AntDB provides online services for more than 1 billion users in 24 provinces across the country. With product features such as high performance, elastic expansion and high reliability, AntDB can process one million core communications transactions per second at peak, ensuring the continuous and stable operation of the system for nearly ten years, and is successfully implemented for commercial purpose in communication, finance, transportation, energy, Internet of Things and other industries. AntDB-M (AntDB in-memory engine) has such an efficient performance, which is inseparable from its excellent memory structure design, and this paper focuses on the memory structure design of AntDB-M.
Overview
AntDB-M is an OLTP database running in all-memory state, and its data is managed in table units through tablespaces. The storage is divided into two forms: 1) in-memory state; 2) file state. The file state is the serialized export file of the in-memory state, and the tablespaces in the latter part refer to the in-memory state.
Tablespace
A tablespace represents the memory space where a table's data is stored. Each table has its own separate tablespace. The tablespace is created when the table is created, or when the service starts loading the table. The structure of a tablespace is a three-level structure with two parts: 1) tablespace metadata; 2) tablespace data blocks;
Three-level structure
The memory structure of a tablespace is a three-level structure: 1) primary addresses, 2) secondary addresses, and 3) data blocks; with this structure, each tablespace can hold as many as 20 trillion records.
Number of records = primary address * secondary address * number of data block records = 32K * 32K * 2K = 20 trillion records;
Figure 1: Tablespace three-level structure
Tablespace metadata
The tablespace metadata describes the size information of the current space and the two-level address information of the table data recorded in the data block. It is used for the management of the tablespace data and for efficient access to the table data. In the three-level structure of tablespace, the first two levels belong to metadata, and the structure is relatively simple, which stores the address information of the next level.
Tablespace data blocks
A tablespace data block is used to store table data. The structure of the data block implementation is a two-way linked table. Each data block is divided into two parts: 1) metadata and 2) data space;
Metadata
The metadata is the management information of the current data block, mainly including: the current data block size, the number of assignable data records, the range of allocated record object IDs, and other information. It should be noted that the size of the data block and the number of assignable data records are not fixed, but are calculated based on the actual size of the table records and some additional data when the table is created.
Data space
The data space is the actual address space allocated for storing table data, and each data space is a contiguous address space. The data space size is divided into 9 classes according to the row size: 256K, 512K, 1M, 2M, 4M, 8M, 16, 32M, 64M. Allocation is based on the fact that each data space can hold 2K~4K records. When the record length is less than 128B, the number of records may exceed 4K.
Data space size = block metadata size + number of records held by a data block* size of table record row(with extension information)
Figure 2: Row record format
Record number
Each record has its own unique number, which is assigned at the time of data insertion. Data is inserted not simply by appending, but by finding a free spot in the data space to insert. Data is also queried, modified, and deleted based on the record number to quickly locate the data. Each free location number is unique. Each data block records the current free location, and these free locations do not require additional space to be recorded. The space that would otherwise be used to store records is used to record the free record location information before the data space within the data block is allocated to a specific record. This free location information is a simple two-way linked table that concatenates all free locations. Each row of records in the tablespace thus has a unique number, which allows the block to be quickly located, as well as the address within the block.
Figure 3: Data block structure diagram
Relationship between record number and record address
The record number can obtain the actual record address by simple calculation.
Figure 4: Relationship between record number and record address
Creation of data blocks
Data blocks are created on demand when data is inserted, updated (record redo records), and when data is loaded into memory by service startup. When data changes require the creation of a data block, only one data block is requested at a time, and the data blocks request their creation in bulk (one at a time) when data is loaded based on the number of data blocks currently required.
Release of data blocks
After a tablespace is created, the tablespace (including metadata and data blocks) is freed only when the table is dropped or truncated. When a table is renamed, only the metadata related to the table is modified and the tablespace remains unchanged.
Memory management
Memory management of tablespaces is divided into two types: 1) data blocks, and 2) non-data blocks. Different types of memory management can provide a more efficient way to manage memory according to its size.
Memory allocation is divided into two types: 1) memory, and 2) memory mapping;
- Memory, i.e., allocating memory directly from RAM.
- Memory mapping, i.e., mapping to file by pmap, which is only used for data blocks of very large tables (the very large table option is set when building the table), but all other uses the "in-memory" method.
The "memory" allocation is also divided into two strategies: 1) RAM; 2) PMEM; where the "PMEM" strategy is only applicable to hardware environments with Intel Athon persistent memory devices installed. Select the tablespace allocation policy during table creation, and the default policy is "RAM".
Overflow column
The records in the data space are fixed-length records. For variable-length type columns, the data is not stored in the record, and only the data length and the actual location of the data are recorded in the record. For overflow column data, there is a separate memory space to store and manage, so we will not expand the explanation here.
Index
Table indexes are classified as hash indexes and btree indexes. Both types of indexes have their own separate memory space. Here we do not expand the description.
Conclusion
AntDB-M lays a solid foundation for overall high performance through its simple and efficient memory structure design. With a small amount of memory, it supports more data records. It allows users to support more business with less cost.