Using and Enhancing NebulaStream – A Tutorial
Steffen Zeuch, Ankit Chaudhary, Viktor Rosenfeld, Taha Tekdogan, Adrian Michalke, Matthis Gördel, Ariane Ziehn, and Volker Markl
Proceedings of the ACM/IFIP International Conference on Distributed and Event-based Systems (DEBS 2024) | June 2024

General Tutorial Information

The NebulaStream tutorial took place on the 28th of June 2024, divided into two 90-minute sessions.

Session 1: Introduction to NebulaStream

This session is dedicated to getting to know sensor-fog-cloud environment, NebulaStream and its internals. The target audience are computer scientists, researchers, and software developers with a background in system engineering and an interest in the challenges of the IoT.

Part I: Data Management for the IoT (approximately 45 min)
Presenter: Steffen Zeuch
In the first part of Session 1, we introduce the sensor-fog-cloud environment as a new processing environment for the future, and outline how it differs from today's dominant cloud environment. We then present the challenges that arise in this new environment and early approaches how to address them.

Part II: NebulaStream Internals (approximately 45 min)
Presenter: Ankit Chaudhary
In the second part of Session 1, we introduce NebulaStream and its architecture, as well as its unique features. We highlight the overall system architecture and our solutions to address current research challenges.

Session 2: Hands-on NebulaStream

This session is dedicated to give attendees a hands-on experience of NebulaStream. We split the second half into two independent parts. In the first part, we present NebulaStream from an end-user perspective and show how to configure it and run applications on it. In the second part, the attendees learn how to implement their own ideas in NebulaStream as system developers and researchers with a background in system engineering and a basic knowledge of C++.

Part I: End User Tutorial (approximately 45 min)
In the first part of Session 2, we first show how to download a binary NebulaStream image, configure sensor sources and the network, and run NebulaStream inside a Docker container. Then, we provide a set of real-world workloads to showcase how to run NebulaStream queries. We use these workloads as application-based scenarios to explore the capabilities of NebulaStream as a stream processing system. These applications cover a wide range of workloads such as ETL, join, aggregation, etc. Furthermore, we show the capabilities of NebulaStream's graphical user interface and how it can be used to submit queries, display the topology, and visualize the results using plots. Finally, we outline how existing workloads on other stream processing systems, such as Apache Flink, can be migrated to NebulaStream and how they benefit from its unique features.

Part II: Developer Tutorial (approximately 45 min)
Presenter: Adrian Michalke and Matthis Gördel
In the second part of Session 2, we invite you to contribute to NebulaStream. Request Access and join our GitHub repository as well as our Slack workspace to implement your own ideas right away. In particular, we show you how to implement and run benchmarks in NebulaStream in order to compare its performance with other systems. Then, we show how to modify operators to perform customized processing easily. We highlight the usage of our Nautilus Compiler that allows us to write and debug interpreted code while achieving the same performance as compiled code. In addition, we have created detailed implementation guides that cover various enhancements with step-by-step explanations.

Outcome for Attendees

The tutorial offers outcomes that are relevant to attendees in multiple aspects. First, in terms of conceptual understanding, attendees gain insight into challenges in data management in the sensor-fog-cloud environment and how NebulaStream addresses these challenges. This understanding helps attendees to discover new data processing workflows or analyses from their own domain and use cases, that are now possible due to the approaches we apply in NebulaStream. Second, from the practical aspect, the tutorial provides attendees with concrete, hands-on skills on how to map real-world problems to NebulaStream. This hands-on skill serves as a foundation for mapping attendees' own use cases to the out-of-the-box features of NebulaStream, allowing them to focus on their data processing workflow. Third, from the research and collaboration aspect, the tutorial provides attendees with starting points to integrate and evaluate their own research as an extension to NebulaStream. Therefore, enabling attendees to focus on the specific method, technique, or optimization that they propose in the context of fog-cloud data management instead of investing much effort in setting up a system or environment. In sum, our tutorial offers valuable outcomes for a wide range of attendees with different backgrounds and goals.

Abstract

Soon, more data will be produced outside the cloud than inside. This constantly increasing amount of data requires new systems that can holistically optimize the processing for so-called sensor-fog-cloud environments. In this tutorial, we present NebulaStream, the first system designed specifically for this unified environment. We highlight research challenges inherent in managing data across a unified sensor-fog-cloud environment. After setting the scene, we dive into NebulaStream’s architecture and system internals, which integrate cloud, fog, and sensor networks to meet these demands effectively. Finally, in a hands-on session, attendees can experience NebulaStream in action, learn how to configure and run IoT applications on the platform, and extend NebulaStream with custom functionalities. By the end of the tutorial, attendees will have a deeper understanding of NebulaStream and will be equipped with helpful guides to use it as a framework to try out their own ideas and benefit from its unique features.

Bibtex: