Open Source Results and Publication Highlights

Here you find the main results of the DAPHNE project. The Open Source components are an invitation to use them in your project and to develop them further. The Publications listed here are summarise the main research contributions of DAPHNE.
Title Description Link
DAPHNE

In this repository, we develop the whole DAPHNE system with all its components including, but not limited to DaphneDSL, DaphneLib, DaphneIR, the DAPHNE Compiler, and the DAPHNE Run-time. The system will be built up and extended gradually in the course of the project.

DAPHNE on GitHub
DaphneLib

DaphneLib is a simple user-facing Python API that allows calling individual basic and higher-level DAPHNE built-in functions using lazy evaluation.

DaphneLib Overview
WebUI

Web monitoring tool for DAPHNE on HPC

Experimental
Umlaut

A modular suite for benchmarking all stages of Machine Learning pipelines. To find bottlenecks in such pipelines and compare different ML tools, this framework can calculate and visualize several metrics in the data preparation, model training, model validation and inference stages.

Umlaut on GitHub
Docker Container

To avoid installing dependencies and to circumvent conflicts with existing installed libraries, one may use containers.

Building and Running with Containers
Singularity Containers

Initial version of singularity containers ported from the Docker version

Singularity Containers

Publication Highlights

Patrick Damme, Matthias Boehm
Enabling Integrated Data Analysis Pipelines on Heterogeneous Hardware through Holistic Extensibility
2nd Workshop on Novel Data Management Ideas on Heterogeneous Hardware Architectures (NoDMC)

In this talk we propose holistic extensibility for IDA pipelines to handle increasing specialization from operators for heterogeneous hardware over the often co-designed data representations to the corresponding optimization and scheduling techniques. We sketch the extensibility design of DAPHNE, which offers users great benefits, while requiring low effort.

 

Chloe Averti, Vasileios Karakostas, Nikhita Kunati, Georgios Goumas, Michael Swif
DaxVM: Stressing the Limits of Memory as a File Interface
MICRO 2022 – 55th IEEE/ACM International Synopsium on Microarchitecture

We analyse the problem of high overhead of virtual memory operations involved in memory mapped file I/O and propose DaxVM, an optimized POSIX-relaxed interface that provides byte-addressable high performance storage.

 

Patrick Damme, Marius Birkenbach, Constantinos Bitsakos, Matthias Boehm, Philippe Bonnet, Florina Ciorba, Mark Dokter, Pawel Dowgiallo, Ahmed Eleliemy, Christian Faerber, Georgios Goumas, Dirk Habich, Niclas Hedam, Marlies Hofer, Wenjun Huang, Kevin Innerebner, Vasileios Karakostas, Roman Kern, Tomaž Kosar, Alexander Krause, Daniel Krems, Andreas Laber, Wolfgang Lehner, Eric Mier, Marcus Paradies, Bernhard Peischl, Gabrielle Poerwawinata, Stratos Psomadakis, Tilmann Rabl, Piotr Ratuszniak, Pedro Silva, Nikolai Skuppin, Andreas Starzacher, Benjamin Steinwender, Ilin Tolovski, Pınar Tözün, Wojciech Ulatowski, Aristotelis Vontzalidis, Yuanyuan Wang, Izajasz Wrosz, Aleš Zamuda, Ce Zhang, Xiao Xiang Zhu
DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines.
Conference on Innovative Data Systems Research, CIDR, 2022

We described the overall architecture and key design decisions of the DAPHNE system infrastructure as an open and extensible system for integrated data analysis pipelines, comprising query processing, ML, and HPC. This is the central publication to be referenced when referring to the DAPHNE project as a whole.

 

Quentin Guilloteau, Jonas H. Müller Korndörfer, Florina M. Ciorba
Seamlessly Scaling Applications with DAPHNE
COMPAS 2024

We present the ongoing work on seamlessly scaling applications with DAPHNE. After exposing the different components of the distributed DAPHNE runtime, we compare a DaphneDSL implementation for the Connected Components algorithm against Python, Julia, and C++ implementations along several dimensions: external dependencies, effort to adapt the code for parallel and distributed executions, and performance.

 

Ahmed Eleliemy, Florina M. Ciorba
DaphneSched: A Scheduler for Integrated Data Analysis Pipelines
ISPDC23

DaphneSched provides a wide range of scheduling schemes for task partitioning and assignment, including self-scheduling and work-stealing. We show that the number of workers and the work queues layout have a significant impact on the performance achievable with DaphneSched.

 

Jonas H. Müller Korndörfer, Ahmed Eleliemy, Osman Seckin Simsek , Thomas Ilsche, Robert Schöne, Florina M. Ciorba
How Do OS and Application Schedulers Interact? An Investigation with Multithreaded Applications
Euro-Par 23, 29th International European Conference on Parallel and Distributed Computing

This work investigates the interaction between OS-level and application thread-level scheduling to explain and quantify their precise roles in application and system performance.

 

Ahmed Eleliemy, Florina M. Ciorba
A Resourceful Coordination Approach for Multilevel Scheduling
International Conference on High Performance Computing & Simulation, HPCS, 2021

We propose a resourceful coordination approach (RCA) that allows application schedulers to cooperate by involving the batch scheduler. We implement the proposed approach in a two-level simulation using realistic and well-known simulators and evaluate it using the effective system performance (ESP) benchmark.

 

Aleš Zamuda, Mark Dokter
DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking Randomised Optimisation Algorithms
CobCom 2024

This paper presents a deployment of DAPHNE on EuroHPC Vega, running randomized optimization algorithms (ROA) tasks with Slurm and an example ROA benchmarking scenario using the HappyCat function is discussed.

 

Nina Ihde, Paula Marten, Ahmed Eleliemy, Gabrielle Poerwawinata, Pedro Silva, Ilin Tolovski, Florina M. Ciorba, and Tilmann Rabl
A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks
Technology Conference on Performance Evaluation & Benchmarking, TPC, published by Springer, 2021

We discuss the state-of-the-art of BD, HPC, and ML benchmarks and summarize a representative selection of some of the classic and most used benchmarks and classify them under the light of a feature space composed of purpose, stage, metric, and convergence, as well as from the perspective of a proposed integrated data analysis architecture.

 

Dirk Habich, Johannes Pietrzyk
SIMDified Data Processing – Foundations, Abstraction, and Advanced Techniques
SIGMOD Conference Companion 2024

This tutorial will provided the attendees with an opportunity to gain insights into the evolving topic of SIMDified data processing. The tutorial was designed with a database audience in mind, while at the same time being aware of a most probably prevailing medium level of knowledge about SIMD.

 

Dirk Habich, Alexander Krause, Johannes Pietrzyk, Christian Faerber, Wolfgang Lehner
Simplicity done right for SIMDified query processing on CPU and FPGA
ACM SIGMOD/PODS

We show a combination of SIMD abstraction libraries to seamlessly port standard templated C++ SIMD host code to the FPGA without the necessity of complex FPGA-specific programming.