The Project - DAPHNE-EU

Objectives

DAPHNE has three main objectives (O): establishing a system architecture with APIs and DSL; develop hierarchical scheduling and task planning strategies; and benchmarking the newly developed framework in high-level, real-life use cases:

System Architecture, APIs and DSL (O1)

Improve the productivity for developing integrated data analysis pipelines via appropriate APIs and a domain-specific language, an overall system architecture for seamless integration with existing data processing frameworks, HPC libraries, and ML systems. A major goal is an open, extensible reference implementation of the necessary compiler and runtime infrastructure to simplify the integration of current and future state-of-the-art methods.

Hierarchical Scheduling and Task Planning (O2)

Improve the utilization of existing computing clusters, multiple heterogeneous hardware devices, and capabilities of modern storage and memory technologies through improved scheduling as well as static (compile time) task planning. In this context, we also aim to automatically leverage interesting data characteristics such as the sorting order, degree of redundancy, and matrix/tensor sparsity.

Use Cases and Benchmarking (O3)

The technological results will be evaluated on a variety of real-world use cases and datasets as well as a new benchmark developed as part of the DAPHNE project. We aim to improve the accuracy and runtime of these use cases combining data management, machine learning, and HPC – this exploratory analysis serves as a qualitative study on productivity improvements. The variety of real-world use cases will further be generalized to a benchmark for integrated data analysis pipelines quantifying the progress compared to state-of-the-art.

An overview of the project work plan shows how a work package is dedicated to every relevant field and how they are bundled to contribute to the three main objectives:

Mangement of the work packages is distributed between all partners, acedemic and industrial. All work packages take place in parallel, with the results and findings of each one feeding into the other ones. To ensure efficient execution of the project as well as to ensure widespread dissemination of the results and adoption of the open-source implementation of DAPHNE, a work package is dedicated to each, project management and dissemination and exploitation.

Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning

Objectives

System Architecture, APIs and DSL (O1)

Hierarchical Scheduling and Task Planning (O2)

Use Cases and Benchmarking (O3)

Integrated Data Analysis Pipelines for Large-Scale Data
Management, HPC, and Machine Learning