NebulaStream - Data Management for the IoT

NebulaStream is a general purpose, end-to-end data management system for cloud-edge-sensor environments that is being built with three design goals in mind:

Ease-of-Use: NebulaStream provides out-of-the-box functionality for common tasks required by multi-modal, multi-frequency streams (e.g., alignment, inference). This enables users to focus on business logic with well-known abstractions and concepts.

Extensibility: NebulaStream empowers users to easily integrate custom data connectors, formats, operators, and optimizations into the system. This helps users tailor NebulaStream to meet their needs.

Efficiency: NebulaStream utilizes distributed heterogeneous computing devices with hardware-tailored code, adaptive execution, and the interleaved processing of data sources to handle large workloads efficiently.

NebulaStream is a joint research project being undertaken at the Berlin Institute for the Foundations of Learning and Data ( BIFOLD). The first contributors to NebulaStream are researchers from the   DIMA Group at TU Berlin   and the   DFKI IAM Group.



Open Source Release Countdown

0
Days
0
Hours
0
Minutes
0
Seconds

NebulaStream at a Glance



The goal of NebulaStream is to enable the processing of thousands of queries over millions of heterogeneous sources in a massively distributed environment. To this end, NebulaStream provides the following core techologies:
  1. Heterogeneous Hardware Support: NebulaStream supports a wide range of devices including different architectures (e.g., ARM, x86) and accelerators (e.g., GPUs, TPUs).
  2. Code Generation: NebulaStream compiles queries into highly efficient code, which increases hardware utilization and reduces energy consumption significantly.
  3. In-Network Processing: NebulaStream utilizes all of the available processing capabilities from the source to the sink, so as to apply processing as early as possible. Thereby, reducing the network traffic as much as possible.
  4. On-Demand Gathering: NebulaStream only acquires data that is required by the current set of active queries and thus does not transfer what is not needed.
  5. Adaptive Resource Management: NebulaStream detects and reacts to changes in the system without impacting query processing.


architecture

NebulaStream at a Glance.

Architecture



  1. Sources/Sinks: Users can send their data using different source connectors and input formats. Commonly used source connectors include JDBC, MQTT, and TCP, and common input formats include CSV and JSON, which we provide to the user out-of-the-box. In addition, users can add custom connectors or formats. Similarly, users can customize connectors and formatters in the Sink Manager.
  2. I/O Handling: Unlike other SPEs that handle sources individually and synchronously by assigning one thread per source, NebulaStream interleaves source processing via thread sharing within its own I/O thread pool and applies asynchronous callbacks to reduce waiting time.
  3. Query Submission: Users can submit queries in either our SQL-like query language with streaming extensions or our pattern specification language (PSL). In addition, NebulaStream provides many built-in operations, like re-sampling and inference. Moreover, it allows users to specify their own operators.
  4. Query Optimization: After submission, a query plan is created and optimized before hardware-tailored code is generated. The user can modify the optimizations by providing their own rules to the rule engine.
  5. Query Execution: During runtime, the Query Engine schedules query processing in a highly dynamic manner using task abstractions.


architecture

NebulaStream Architecture.

Publications


Project Overview

The NebulaStream Platform: Data and Application Management for the Internet of Things
CIDR 2020 | Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Bress, Jonas Traub, Volker Markl
NebulaStream: Complex Analytics Beyond the Cloud
VLIoT 2020 | Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M. Grulich, Ariane Ziehn, Volker Markl
Showcasing Data Management Challenges for Future IoT Applications with NebulaStream
VLDB 2023 | Aljoscha Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Ankit Chaudhary, Philipp M. Grulich, Steffen Zeuch, Volker Markl
Using and Enhancing NebulaStream – A Tutorial
DEBS 2024 | Steffen Zeuch, Ankit Chaudhary, Viktor Rosenfeld, Taha Tekdogan, Adrian Michalke, Matthis Gördel, Ariane Ziehn, Volker Markl
NebulaStream: An Extensible, High-Performance Streaming Engine for Multi-Modal Edge Applications
SIGMOD 2025 | Adrian Michalke, Aljoscha Lepping, Volker Markl, Ricardo Martinez, Nils Schubert, Lukas Schwerdtfeger, Taha Tekdogan, Steffen Zeuch, Ariane Ziehn, Christoph Falkensteiner, Kyle Krüger, Alexander Meyer, Tobias Röschl, Svea Wilkending

System Publications

ICDE 2025 | Ankit Chaudhary, Kaustubh Beedkar, Jeyhun Karimov, Felix Lang, Steffen Zeuch, Volker Markl
Fault Tolerance Placement in the Internet of Things
SIGMOD 2024 | Anastasiia Kozar, Bonaventura Del Monte, Steffen Zeuch, Volker Markl
Query Compilation Without Regrets
SIGMOD 2024 | Philipp M. Grulich, Aljoscha P. Lepping, Dwi P. A. Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl
Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies
VLDB 2024 | Xenofon Chatziliadis, Eleni Tzirita Zacharatou, Alphan Eracar, Steffen Zeuch, Volker Markl
Incremental Stream Query Merging
EDBT 2023 | Ankit Chaudhary, Jeyhun Karimov, Steffen Zeuch, Volker Markl
Rethinking Stateful Stream Processing with RDMA
SIGMOD 2022 | Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl
Babelfish: Efficient Execution of Polyglot Queries
VLDB 2022 | Philipp M. Grulich, Steffen Zeuch, Volker Markl
An Energy-Efficient Stream Join for the Internet of Things
DAMON 2021 | Adrian Michalke, Philipp M. Grulich, Clemens Lutz, Steffen Zeuch, Volker Markl
Streaming Data through the IoT via Actor-Based Semantic Routing Trees
VLIoT 2021 | Dimitrios Giouroukis, Johannes Jestram, Steffen Zeuch, Volker Markl
Monitoring of Stream Processing EnginesBeyond the Cloud: an Overview
VLIoT 2021 | Xenofon Chatziliadis, Eleni Tzirita Zacharatou, Steffen Zeuch, Volker Markl
ExDRa: Exploratory Data Science on Federated Raw Data
SIGMOD 2021 | Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch
Parallelizing Intra-Window Join on Multicores: An Experimental Study
SIGMOD 2021 | Shuhao Zhang, Yancan Mao, Jiong He, Philipp M Grulich, Steffen Zeuch, Bingsheng He, Richard TB Ma, Volker Markl
Towards Resilient Data Management for the Internet of Moving Things
BTW 2021 | Elena Beatriz Ouro Paz, Eleni Tzirita Zacharatou, Volker Markl
Automatic Tuning of Read-Time Tolerances for Optimized On-Demand Data-Streaming from Sensor Nodes
EDBT 2021 | Julius Hülsmann, Chiao-Yun Li, Jonas Traub, Volker Markl
Demand-based Sensor Data Gathering with Multi-Query Optimization
VLDB 2020 | Julius Hülsmann, Jonas Traub, Volker Markl
Complex Event Processing for the Internet of Things
VLDB 2020 PhD Workshop | Ariane Ziehn
A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Things
DEBS 2020 | Dimitrios Giouroukis, Alexander Dadian, Jonas Traub, Steffen Zeuch, Volker Markl
Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines
SIGMOD 2020 | Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl
Grizzly: Efficient Stream Processing Through Adaptive Query Compilation
SIGMOD 2020 | Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl
Scaling a Public Transport Monitoring System to Internet of Things Infrastructures
EDBT 2020 | Haralampos Gavriilidis, Adrian Michalke, Laura Mons, Steffen Zeuch, Volker Markl
Governor: Operator Placement for a Unified Fog-Cloud Environment
EDBT 2020 | Ankit Chaudhary, Steffen Zeuch, Volker Markl
Disco: Efficient Distributed Window Aggregation
EDBT 2020 | Lawrence Benson, Philipp M. Grulich, Steffen Zeuch, Volker Markl, Tilmann Rabl
SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence
arXiv Preprint 2019 | Jonas Traub, Julius Hülsmann, Sebastian Breß, Tilmann Rabl, Volker Markl
Analyzing Efficient Stream Processing on Modern Hardware
VLDB 2019 | Steffen Zeuch, Sebastian Breß, Tilmann Rabl, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Volker Markl
Efficient Window Aggregation with General Stream Slicing
EDBT 2019 | Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
Resense: Transparent Record and Replay of Sensor Data in the Internet of Things
EDBT 2019 | Dimitrios Giouroukis, Julius Hülsmann, Janis von Bleichert, Morgan Geldenhuys, Tim Stullich, Felipe Gutierrez, Jonas Traub, Kaustubh Beedkar, Volker Markl
Generating Reproducible Out-Of-Order Data Streams
DEBS 2019 | Philipp M. Grulich, Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
Optimized On-Demand Data Streaming from Sensor Nodes
SoCC 2017 | Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, Volker Markl

Project Leads


Current Researchers


Collaborate with us!

Opportunities

Feel free to reach out to us to learn more about research opportunities as a Postdoc, PhD student, or student assistant. Furthermore, motivated students can also inquire about the possibility of pursing a Bachelor’s or Master’s thesis with us. Our research topics span all aspects of the IoT: query compilation, query optimization, query processing, query languages, distributed data processing, complex-event processing, machine learning, signal processing, sensor networks, fog computing, temporal-spatial query processing, transactional data processing, and modern hardware, among others.

Contact

Database Systems and Information Management (DIMA) Group Technische Universität Berlin
Sekr. E-N 7, Room E-N 728
Einsteinufer 17
10587 Berlin
Germany
+49 30 314 23555 nebulastream(at)dima.tu-berlin.de