The Concept of Stream Processing
The attacks on the World Trade Center in the United States on 11 September 2001 were the first time that the United States Department of Defense incorporated “active forewarning” into its macro-strategic planning for national defence. And IBM, as the world's largest IT company at that time, undertook a significant amount of basic tasks supporting software research and development. In which, the IBM InfoSphere Streams officially released in 2009, is one of the world's earliest real sense of commercial stream data processing engines.
Typical stream processing frameworks, such as Apache Storm, Spark Streaming, Flink, etc. are also based on IBM's design concepts, using the “request to send + result return” model for research and development, and are widely used in real-time Internet type businesses to conduct real-time pre-processing of massive events generated in front.
Gartner, in its China Database Management Systems Market Guide 2022, defined stream processing as: Involving the observation and triggering of “events”, usually captured “at the edge”, including the transfer of results to other business stages. It will gain more attention in the next five years.
Figure: Gartner's Definition of Stream/Event Processing
Pain Points of Traditional Deployment Architecture
However, the design of stream processing frameworks such as Apache Storm, Spark Streaming, and Flink all focus on the “processing” itself. Since it does not have its own database capabilities, complex data extraction is required when interactions with other data such as correlation and temporary storage are required. This requires many developers to write complex Java/C++/Scala code, the most traditional way of pre-processing records one by one, and also need to constantly retrieve additional data for manual correlation from other external caches/databases in real time, which results extreme burden of the development and operation and maintenance.
In more than ten years of development of AsiaInfo's AntDB, we have seen too many business scenarios of service providers on core data processing. Some of these requirements can be easily met with traditional technologies, but there are still some that require real-time processing capabilities such as stream computing to support.
Organic Integration of Database and Stream Processing
The stream data processing model is very different from the kernel design of traditional databases. The core essence lies in the fact that in the traditional database architecture design, the relationship between the application and the database is a “request-response” model, which means that when a business initiates an SQL request, the database executes the request and returns the result accordingly.
On the other hand, the stream processing kernel adopts a “subscribe-push” model. Through predefined data processing models, it processes the business “events” carried by the data and then pushes the processed results to downstream application for display or storage.
Figure: The Fundamental Architecture of AntDB's Stream Processing Engine
As a result, in the field of real-time stream data processing, AsiaInfo has conducted extensive and innovative researches and explorations from scratch with AntDB. At the end of 2022, the company launched the AntDB-S stream processing database engine, which completely integrated stream computing with traditional transactional and analytical data storage, allowing users to freely define data structures and real-time processing logic within the database engine using standard SQL.
In addition, during the process of free data flow between stream objects and table objects within the database, users can conduct performance optimization, data process, cluster monitor, and business logic customization at any time by establishing indexes, associating stream tables, using triggers, creating materialized views, and other means.
Functional Advantages
Technology Stack Simplification: In the processing of real-time stream events, AntDB's stream processing integrated engine can make a significant amount of real-time data processing happened within the data warehouse, further aligning itself with general-purpose transactions.
Standard SQL Definition: Traditional stream processing methods have limited capabilities in handling SQL and often require writing extensive business codes. However, AntDB-S facilitates convenient use of streams by processing them through unified SQL statements.
Unified Data Interface: Supporting the conversion between stream and batch modes. AntDB's unified hyper-converged architecture achieves a unified external interface, eliminating the need to separate data collection and processing, as both stream and batch processing can be fully handled with SQL.
Support for Complete Transaction Processing: Traditional stream processing methods do not support data modification, but AntDB-S supports data modification and transactional operations during the stream processing.
More Accurate Real-Time Results: With the ACID properties of distributed transactions, AntDB-S can resolve issues related to data disaster tolerance and consistency in real-time stream data processing, which enables precise identification of data failure points and facilitates the correction and recalculation of stream events.
About AsiaInfo Anhui AntDB
Founded in 2008, AntDB serves more than 1 billion users in 24 cities, provinces and autonomous regions across the country on the core systems of service providers. It has product features such as high performance, elastic scalability, and high reliability with a capability of handling millions of core communication transactions per second at peak values, ensuring the stable and continuous operation of the system for over a decade, and has been successfully commercialized in industries such as communications, finance, transportation, energy sources, and the IoT.