Data Management
for the Internet of Things

NebulaStream is a general-purpose, end-to-end data-management system for cloud-edge-sensor environments built around three core goals:

Ease of Use

Out-of-the-box functionality for multi-modal, multi-frequency streams (e.g., alignment, inference). Enables users to focus on business logic with well-known abstractions and concepts.

Extensibility

Empower users to easily integrate custom data connectors, formats, operators, and optimizations into the system.

Efficiency

Utilize distributed heterogeneous computing devices with hardware-tailored code, adaptive execution, and the interleaved processing of data sources to handle large workloads efficiently.

NebulaStream is a joint research project at BIFOLD, with first contributors from the  DIMA Group at TU Berlin and the  DFKI IAM Group.

NebulaStream Vision

Our goal is to process thousands of queries over millions of heterogeneous sources in a massively distributed environment. We achieve this through five core technologies:

  1. Heterogeneous Hardware Support: Supports a wide range of devices including different architectures (e.g., ARM, x86) and accelerators (e.g., GPUs, TPUs).
  2. Code Generation: Compiles every query to efficient, low-energy native code.
  3. In-Network Processing: Pushes operators as close as possible to the data source to reduce network traffic.
  4. On-Demand Gathering: Utilizes all of the available processing capabilities from the source to the sink, so as to apply processing as early as possible. Thereby, reducing the network traffic as much as possible.
  5. Adaptive Resource Management: Reacts to topology or workload changes without interrupting queries.
NebulaStream core-technology diagram
NebulaStream architecture diagram

NebulaStream Architecture

A modular pipeline that stretches from sensor to cloud, optimising every hop along the way.

  • 1 Sources & Sinks: Users can send their data using different source connectors and input formats. Commonly used source connectors include JDBC, MQTT, and TCP, and common input formats include CSV and JSON, which we provide to the user out-of-the-box. In addition, users can add custom connectors or formats. Similarly, users can customize connectors and formatters in the Sink Manager.
  • 2 I/O Handling: Unlike other SPEs that handle sources individually and synchronously by assigning one thread per source, NebulaStream interleaves source processing via thread sharing within its own I/O thread pool and applies asynchronous callbacks to reduce waiting time.
  • 3 Query Submission: Users can submit queries in either our SQL-like query language. NebulaStream provides many built-in operations, like re-sampling and inference. Moreover, it allows users to specify their own operators.
  • 4 Query Optimization: After submission, a query plan is created and optimized before hardware-tailored code is generated. The user can modify the optimizations by providing their own rules to the rule engine.
  • 5 Adaptive Runtime: During runtime, the query engine schedules query processing in a highly dynamic manner using task abstractions.

Publications


Project Overview

NebulaStream: An Extensible, High-Performance Streaming Engine for Multi-Modal Edge Applications
SIGMOD 2025 | Adrian Michalke, Aljoscha Lepping, Volker Markl, Ricardo Martinez, Nils Schubert, Lukas Schwerdtfeger, Taha Tekdogan, Steffen Zeuch, Ariane Ziehn, Christoph Falkensteiner, Kyle Krüger, Alexander Meyer, Tobias Röschl, Svea Wilkending
Using and Enhancing NebulaStream – A Tutorial
DEBS 2024 | Steffen Zeuch, Ankit Chaudhary, Viktor Rosenfeld, Taha Tekdogan, Adrian Michalke, Matthis Gördel, Ariane Ziehn, Volker Markl
Showcasing Data Management Challenges for Future IoT Applications with NebulaStream
VLDB 2023 | Aljoscha Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Ankit Chaudhary, Philipp M. Grulich, Steffen Zeuch, Volker Markl
NebulaStream: Complex Analytics Beyond the Cloud
VLIoT 2020 | Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M. Grulich, Ariane Ziehn, Volker Markl
The NebulaStream Platform: Data and Application Management for the Internet of Things
CIDR 2020 | Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Bress, Jonas Traub, Volker Markl

System Publications

ICDE 2025 | Ankit Chaudhary, Kaustubh Beedkar, Jeyhun Karimov, Felix Lang, Steffen Zeuch, Volker Markl
Fault Tolerance Placement in the Internet of Things
SIGMOD 2024 | Anastasiia Kozar, Bonaventura Del Monte, Steffen Zeuch, Volker Markl
Query Compilation Without Regrets
SIGMOD 2024 | Philipp M. Grulich, Aljoscha P. Lepping, Dwi P. A. Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl
Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies
VLDB 2024 | Xenofon Chatziliadis, Eleni Tzirita Zacharatou, Alphan Eracar, Steffen Zeuch, Volker Markl
Incremental Stream Query Merging
EDBT 2023 | Ankit Chaudhary, Jeyhun Karimov, Steffen Zeuch, Volker Markl
Rethinking Stateful Stream Processing with RDMA
SIGMOD 2022 | Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl
Babelfish: Efficient Execution of Polyglot Queries
VLDB 2022 | Philipp M. Grulich, Steffen Zeuch, Volker Markl
An Energy-Efficient Stream Join for the Internet of Things
DAMON 2021 | Adrian Michalke, Philipp M. Grulich, Clemens Lutz, Steffen Zeuch, Volker Markl
Streaming Data through the IoT via Actor-Based Semantic Routing Trees
VLIoT 2021 | Dimitrios Giouroukis, Johannes Jestram, Steffen Zeuch, Volker Markl
Monitoring of Stream Processing EnginesBeyond the Cloud: an Overview
VLIoT 2021 | Xenofon Chatziliadis, Eleni Tzirita Zacharatou, Steffen Zeuch, Volker Markl
ExDRa: Exploratory Data Science on Federated Raw Data
SIGMOD 2021 | Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch
Parallelizing Intra-Window Join on Multicores: An Experimental Study
SIGMOD 2021 | Shuhao Zhang, Yancan Mao, Jiong He, Philipp M Grulich, Steffen Zeuch, Bingsheng He, Richard TB Ma, Volker Markl
Towards Resilient Data Management for the Internet of Moving Things
BTW 2021 | Elena Beatriz Ouro Paz, Eleni Tzirita Zacharatou, Volker Markl
Automatic Tuning of Read-Time Tolerances for Optimized On-Demand Data-Streaming from Sensor Nodes
EDBT 2021 | Julius Hülsmann, Chiao-Yun Li, Jonas Traub, Volker Markl
Demand-based Sensor Data Gathering with Multi-Query Optimization
VLDB 2020 | Julius Hülsmann, Jonas Traub, Volker Markl
Complex Event Processing for the Internet of Things
VLDB 2020 PhD Workshop | Ariane Ziehn
A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Things
DEBS 2020 | Dimitrios Giouroukis, Alexander Dadian, Jonas Traub, Steffen Zeuch, Volker Markl
Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines
SIGMOD 2020 | Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl
Grizzly: Efficient Stream Processing Through Adaptive Query Compilation
SIGMOD 2020 | Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl
Scaling a Public Transport Monitoring System to Internet of Things Infrastructures
EDBT 2020 | Haralampos Gavriilidis, Adrian Michalke, Laura Mons, Steffen Zeuch, Volker Markl
Governor: Operator Placement for a Unified Fog-Cloud Environment
EDBT 2020 | Ankit Chaudhary, Steffen Zeuch, Volker Markl
Disco: Efficient Distributed Window Aggregation
EDBT 2020 | Lawrence Benson, Philipp M. Grulich, Steffen Zeuch, Volker Markl, Tilmann Rabl
SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence
arXiv Preprint 2019 | Jonas Traub, Julius Hülsmann, Sebastian Breß, Tilmann Rabl, Volker Markl
Analyzing Efficient Stream Processing on Modern Hardware
VLDB 2019 | Steffen Zeuch, Sebastian Breß, Tilmann Rabl, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Volker Markl
Efficient Window Aggregation with General Stream Slicing
EDBT 2019 | Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
Resense: Transparent Record and Replay of Sensor Data in the Internet of Things
EDBT 2019 | Dimitrios Giouroukis, Julius Hülsmann, Janis von Bleichert, Morgan Geldenhuys, Tim Stullich, Felipe Gutierrez, Jonas Traub, Kaustubh Beedkar, Volker Markl
Generating Reproducible Out-Of-Order Data Streams
DEBS 2019 | Philipp M. Grulich, Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
Optimized On-Demand Data Streaming from Sensor Nodes
SoCC 2017 | Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, Volker Markl

Project Leads


Current Researchers


Collaborate with us!

Opportunities

Feel free to reach out to us to learn more about research opportunities as a Postdoc, PhD student, or student assistant. Furthermore, motivated students can also inquire about the possibility of pursing a Bachelor’s or Master’s thesis with us. Our research topics span all aspects of the IoT: query compilation, query optimization, query processing, query languages, distributed data processing, complex-event processing, machine learning, signal processing, sensor networks, fog computing, temporal-spatial query processing, transactional data processing, and modern hardware, among others.

Contact

Database Systems and Information Management (DIMA) Group Technische Universität Berlin
Sekr. E-N 7, Room E-N 728
Einsteinufer 17
10587 Berlin
Germany
+49 30 314 23555 nebulastream(at)dima.tu-berlin.de