NebulaStream - Data Management for the Internet of Things

Data Management
for the Internet of Things

NebulaStream is a general-purpose, end-to-end data-management system for cloud-edge-sensor environments built around three core goals:

Ease of Use

Out-of-the-box functionality for multi-modal, multi-frequency streams (e.g., alignment, inference). Enables users to focus on business logic with well-known abstractions and concepts.

Extensibility

Empower users to easily integrate custom data connectors, formats, operators, and optimizations into the system.

Efficiency

Utilize distributed heterogeneous computing devices with hardware-tailored code, adaptive execution, and the interleaved processing of data sources to handle large workloads efficiently.

NebulaStream is a joint research project at BIFOLD, with first contributors from the DIMA Group at TU Berlin and the DFKI IAM Group.

NebulaStream Vision

Our goal is to process thousands of queries over millions of heterogeneous sources in a massively distributed environment. We achieve this through five core technologies:

Heterogeneous Hardware Support: Supports a wide range of devices including different architectures (e.g., ARM, x86) and accelerators (e.g., GPUs, TPUs).

Code Generation: Compiles every query to efficient, low-energy native code.

In-Network Processing: Pushes operators as close as possible to the data source to reduce network traffic.

On-Demand Gathering: Utilizes all of the available processing capabilities from the source to the sink, so as to apply processing as early as possible. Thereby, reducing the network traffic as much as possible.

Adaptive Resource Management: Reacts to topology or workload changes without interrupting queries.

NebulaStream Architecture

A modular pipeline that stretches from sensor to cloud, optimising every hop along the way.

1 Sources & Sinks: Users can send their data using different source connectors and input formats. Commonly used source connectors include JDBC, MQTT, and TCP, and common input formats include CSV and JSON, which we provide to the user out-of-the-box. In addition, users can add custom connectors or formats. Similarly, users can customize connectors and formatters in the Sink Manager.
2 I/O Handling: Unlike other SPEs that handle sources individually and synchronously by assigning one thread per source, NebulaStream interleaves source processing via thread sharing within its own I/O thread pool and applies asynchronous callbacks to reduce waiting time.
3 Query Submission: Users can submit queries in either our SQL-like query language. NebulaStream provides many built-in operations, like re-sampling and inference. Moreover, it allows users to specify their own operators.
4 Query Optimization: After submission, a query plan is created and optimized before hardware-tailored code is generated. The user can modify the optimizations by providing their own rules to the rule engine.
5 Adaptive Runtime: During runtime, the query engine schedules query processing in a highly dynamic manner using task abstractions.

Publications

Project Overview

NebulaStream: An Extensible, High-Performance Streaming Engine for Multi-Modal Edge Applications

SIGMOD 2025 | Adrian Michalke, Aljoscha Lepping, Volker Markl, Ricardo Martinez, Nils Schubert, Lukas Schwerdtfeger, Taha Tekdogan, Steffen Zeuch, Ariane Ziehn, Christoph Falkensteiner, Kyle Krüger, Alexander Meyer, Tobias Röschl, Svea Wilkending

Using and Enhancing NebulaStream – A Tutorial

DEBS 2024 | Steffen Zeuch, Ankit Chaudhary, Viktor Rosenfeld, Taha Tekdogan, Adrian Michalke, Matthis Gördel, Ariane Ziehn, Volker Markl

Showcasing Data Management Challenges for Future IoT Applications with NebulaStream

VLDB 2023 | Aljoscha Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Ankit Chaudhary, Philipp M. Grulich, Steffen Zeuch, Volker Markl

NebulaStream: Complex Analytics Beyond the Cloud

VLIoT 2020 | Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M. Grulich, Ariane Ziehn, Volker Markl

The NebulaStream Platform: Data and Application Management for the Internet of Things

CIDR 2020 | Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Bress, Jonas Traub, Volker Markl

System Publications

Incremental Stream Query Placement in Massively Distributed and Volatile Infrastructures

ICDE 2025 | Ankit Chaudhary, Kaustubh Beedkar, Jeyhun Karimov, Felix Lang, Steffen Zeuch, Volker Markl

Fault Tolerance Placement in the Internet of Things

SIGMOD 2024 | Anastasiia Kozar, Bonaventura Del Monte, Steffen Zeuch, Volker Markl

Query Compilation Without Regrets

SIGMOD 2024 | Philipp M. Grulich, Aljoscha P. Lepping, Dwi P. A. Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl

Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies

VLDB 2024 | Xenofon Chatziliadis, Eleni Tzirita Zacharatou, Alphan Eracar, Steffen Zeuch, Volker Markl

Incremental Stream Query Merging

EDBT 2023 | Ankit Chaudhary, Jeyhun Karimov, Steffen Zeuch, Volker Markl

Rethinking Stateful Stream Processing with RDMA

SIGMOD 2022 | Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl

Babelfish: Efficient Execution of Polyglot Queries

VLDB 2022 | Philipp M. Grulich, Steffen Zeuch, Volker Markl

An Energy-Efficient Stream Join for the Internet of Things

DAMON 2021 | Adrian Michalke, Philipp M. Grulich, Clemens Lutz, Steffen Zeuch, Volker Markl

Streaming Data through the IoT via Actor-Based Semantic Routing Trees

VLIoT 2021 | Dimitrios Giouroukis, Johannes Jestram, Steffen Zeuch, Volker Markl

Monitoring of Stream Processing EnginesBeyond the Cloud: an Overview

VLIoT 2021 | Xenofon Chatziliadis, Eleni Tzirita Zacharatou, Steffen Zeuch, Volker Markl

ExDRa: Exploratory Data Science on Federated Raw Data

SIGMOD 2021 | Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch

Parallelizing Intra-Window Join on Multicores: An Experimental Study

SIGMOD 2021 | Shuhao Zhang, Yancan Mao, Jiong He, Philipp M Grulich, Steffen Zeuch, Bingsheng He, Richard TB Ma, Volker Markl

Towards Resilient Data Management for the Internet of Moving Things

BTW 2021 | Elena Beatriz Ouro Paz, Eleni Tzirita Zacharatou, Volker Markl

Automatic Tuning of Read-Time Tolerances for Optimized On-Demand Data-Streaming from Sensor Nodes

EDBT 2021 | Julius Hülsmann, Chiao-Yun Li, Jonas Traub, Volker Markl

Demand-based Sensor Data Gathering with Multi-Query Optimization

VLDB 2020 | Julius Hülsmann, Jonas Traub, Volker Markl

Complex Event Processing for the Internet of Things

VLDB 2020 PhD Workshop | Ariane Ziehn

A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Things

DEBS 2020 | Dimitrios Giouroukis, Alexander Dadian, Jonas Traub, Steffen Zeuch, Volker Markl

Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

SIGMOD 2020 | Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

SIGMOD 2020 | Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl

Scaling a Public Transport Monitoring System to Internet of Things Infrastructures

EDBT 2020 | Haralampos Gavriilidis, Adrian Michalke, Laura Mons, Steffen Zeuch, Volker Markl

Governor: Operator Placement for a Unified Fog-Cloud Environment

EDBT 2020 | Ankit Chaudhary, Steffen Zeuch, Volker Markl

Disco: Efficient Distributed Window Aggregation

EDBT 2020 | Lawrence Benson, Philipp M. Grulich, Steffen Zeuch, Volker Markl, Tilmann Rabl

SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence

arXiv Preprint 2019 | Jonas Traub, Julius Hülsmann, Sebastian Breß, Tilmann Rabl, Volker Markl

Analyzing Efficient Stream Processing on Modern Hardware

VLDB 2019 | Steffen Zeuch, Sebastian Breß, Tilmann Rabl, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Volker Markl

Efficient Window Aggregation with General Stream Slicing

EDBT 2019 | Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

Resense: Transparent Record and Replay of Sensor Data in the Internet of Things

EDBT 2019 | Dimitrios Giouroukis, Julius Hülsmann, Janis von Bleichert, Morgan Geldenhuys, Tim Stullich, Felipe Gutierrez, Jonas Traub, Kaustubh Beedkar, Volker Markl