Datastream Systems

Datastreams are created in a growing number of scenarios (e.g. in sensor networks) and it is required to extract relevant events from them. Data-stream management systems (DSMS) use techniques known from DBMS - most prominently the declarative programming inherent to queries - in order to address this issue.

Projects

BATS-TP3: Cross-system Optimization of Data-stream Queries

Animal observation can be improved significantly with the help of modern technology.  Especially in the case of bats this has been quite difficult in the past, because they are active only at night, and they must not carry a transmitter heavier than 10% of their own body weight.  The progress of micro electronics has made it possible to build complete computers that fulfill these requirements.  They do not just send beacons, but data.  The ground station receives these data and evaluates them.  That begins with the position of the bat, continues with meetings with other bats (of particular interest: mother and child), and ends with body temperature, pulse, and other biosensors.  The data are collected and evaluated as a whole.  Unfortunately, they are rather imprecise and also flawed.  So it is a substantial issue to clean them first.  The far-end goal of the project is to capture only data actually needed for the evaluations given, thus saving energy and allowing to observer the bats even longer.

More information

DSAM: Data Stream Application Manager

DSAM is a middleware for managing global data-stream queries. These queries are distributed to heterogeneous platforms including self-contained data-stream management systems and sensor networks. The project's main goal is to automatically distribute and deploy a platform-independent model, i.e. a global query, to heterogeneous and distributed stream-processing components. Queries are defined in a declarative abstract query language. They are partitioned according to cost models and topological constraints. DSAM then generates queries in the target systems' query language, each implementing a partial query. For sensor networks, we additionally adopt source-code generation. Further challenges are monitoring, efficient metadata management and decentralized query management, especially in the context of wireless sensor networks.

More information

SKYSHARK: SKYSHARK -Benchmarking Data Processing Systems Using Real-Time Flight Data

To test and evaluate a heterogeneous stream-processing system consisting of an FPGA-based systemon-chip and a host, we develop a benchmark called SKYSHARK. It uses real-world data from air-traffic control that is publicly available. These data are enhanced for the purpose of the benchmark without changing their characteristics. They are further enriched with aircraft and airport data. We define 14 queries with respect to the particular requirements of our system. They should be useful for other…

More information

Participating Scientists

Publications