Incremental Stream Query Merging
Ankit Chaudhary, Jeyhun Karimov, Steffen Zeuch, Volker Markl
Proceedings of the 26th International Conference on Extending Database Technology (EDBT 2023) | March 2023


Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They mainly focus on achieving high throughput and low-latency for a single query. This focus neglects the possible sharing opportunities of data and compute among multiple, long-running queries. Common approaches in batch-oriented systems mainly utilize simple and fast query merging algorithms based on syntactic similarities as the overhead of more extensive approaches would not amortize over the short query runtime. In contrast, streaming queries are continuous and long-running, such that extensive approaches, like taking the semantics of queries into account, may pay off. Furthermore, the long-running nature of streaming queries requires the merging of existing and newly arriving queries, unlike batch queries where merging is performed only among a batch of arriving queries. In this paper, we propose Incremental Stream Query Merging (ISQM), an end-to-end solution to identify and maintain sharing among thousands of stream queries. ISQM captures the semantic information of stream queries to enable merging even in the presence of syntactic differences. Our evaluation shows that ISQM exploits up to 65x more sharing opportunities than the naive baseline using hash-based signatures, scales linearly for thousands of queries, and saves a significant amount of resources compared to state-of-the-art approaches.