NebulaStream: Data Management for the Internet of Things
Steffen Zeuch, Ankit Chaudhary, Viktor Rosenfeld, Taha Tekdogan, Adrian Michalke, Matthis Gördel, Ariane Ziehn, and Volker Markl
Proceedings of the ACM/IFIP International Conference on Distributed and Event-based Systems (DEBS 2024) | June 2024

General Tutorial Information

The NebulaStream tutorial will take place on Friday 28th of June 2024 and consists of two 90-minute sessions.

Session 1: Introduction to NebulaStream

10:30 AM - 12:00 PM
This session is dedicated to getting to know sensor-fog-cloud environment, NebulaStream and its internals. The target audience are computer scientists, researchers, and software developers with a background in system engineering and an interest in the challenges of the IoT.

Part I: Data Management for the IoT (approximately 45 min)
Presenter: Steffen Zeuch
In the first part of Session 1, we will introduce the sensor-fog-cloud environment as a new processing environment for the future, and outline how it differs from today's dominant cloud environment. We then present the challenges that arise in this new environment and early approaches how to address them.

Part II: NebulaStream Internals (approximately 45 min)
Presenter: Ankit Chaudhary
In the second part of Session 1, we will introduce NebulaStream and its architecture, as well as its unique features. We will highlight the overall system architecture and our solutions to address current research challenges.

Session 2: Hands-on NebulaStream

1:30 PM - 3:00 PM
This session is dedicated to give attendees a hands-on experience of NebulaStream. We split the second half into two independent parts. In the first part, we will present NebulaStream from an end-user perspective and show how to configure it and run applications on it. In the second part, the attendees will learn how to implement their own ideas in NebulaStream as system developers and researchers with a background in system engineering and a basic knowledge of C++.

Part I: End User Tutorial (approximately 45 min)
In the first part of Session 2, we will first show how to download a binary NebulaStream image, configure sensor sources and the network, and run NebulaStream inside a Docker container. Then, we will provide a set of real-world workloads to showcase how to run NebulaStream queries. We will use these workloads as application-based scenarios to explore the capabilities of NebulaStream as a stream processing system. These applications cover a wide range of workloads such as ETL, join, aggregation, etc. Furthermore, we show the capabilities of NebulaStream's graphical user interface and how it can be used to submit queries, display the topology, and visualize the results using plots. Finally, we outline how existing workloads on other stream processing systems, such as Apache Flink, can be migrated to NebulaStream and how they benefit from its unique features.

Part II: Developer Tutorial (approximately 45 min)
Presenter: Adrian Michalke and Matthis Gördel
In preparation for the tutorial and in order to start implementing your ideas right away, you can send an email with the subject DEBS Tutorial Session 2 Part 2 containing your GitHub name to debs-tutorial@nebula.stream and join our Slack workspace.
In the second part of Session 2, we will show the attendees how to implement and run benchmarks in NebulaStream in order to compare its performance with other systems. Then, we will show how to modify operators to perform customized processing easily. In particular, we will highlight the usage of our Nautilus Compiler that allows us to write and debug interpreted code but at the same time achieve the same performance as compiled code. In addition, we have created detailed implementation guides that attendees can run independently after this session.

Outcome for Attendees

The tutorial offers outcomes that are relevant to attendees in multiple aspects. First, in terms of *conceptual* understanding, attendees will gain insight into challenges in data management in the sensor-fog-cloud environment and how NebulaStream addresses these challenges. This understanding will help attendees to discover new data processing workflows or analyses from their own domain and use cases, that are now possible due to the approaches we apply in NebulaStream. Second, from the *practical* aspect, the tutorial provides attendees with concrete, hands-on skills on how to map real-world problems to NebulaStream. This hands-on skill serves as a foundation for mapping attendees' own use cases to the out-of-the-box features of NebulaStream, allowing them to focus on their data processing workflow. Third, from the *research and collaboration* aspect, the tutorial provides attendees with starting points to integrate and evaluate their own research as an extension to NebulaStream. Therefore, enabling attendees to focus on the specific method, technique, or optimization that they propose in the context of fog-cloud data management instead of investing much effort in setting up a system or environment. In sum, our tutorial offers valuable outcomes for a wide range of attendees with different backgrounds and goals.

Abstract

Soon, more data will be produced outside the cloud than inside. This constantly increasing amount of data requires new systems that can holistically optimize the processing for so-called sensor-fog-cloud environments. In this tutorial, we present NebulaStream, the first system designed specifically for this unified environment. We highlight research challenges inherent in managing data across a unified sensor-fog-cloud environment. After setting the scene, we dive into NebulaStream’s architecture and system internals, which integrate cloud, fog, and sensor networks to meet these demands effectively. Finally, in a hands-on session, attendees can experience NebulaStream in action, learn how to configure and run IoT applications on the platform, and extend NebulaStream with custom functionalities. By the end of the tutorial, attendees will have a deeper understanding of NebulaStream and will be equipped with helpful guides to use it as a framework to try out their own ideas and benefit from its unique features.

Bibtex: