WP2:Data Exploration Framework

WP2 will address challenges related to data provision and data analysis. As illustrated in the two Show Cases approaches are required e.g. to access quality controlled real-time data, to close data gaps, to upscale data, or to combine data that have a large variety in spatial and temporal scale. The strategy of WP2 is to meet the requirements via designing methods and workflows that enable natural scientists to make use of streamlined data flows as well as tailored visual data exploration and machine learning approaches. WP2 follows the concept of scientific workflows. To specify the requirements from natural science, scientific workflows are modelled on the conceptual level and options for improvement by data science methods are defined (Task 2.1). Selection, combination and enhancement of suitable methods (as workflow components) are further tasks of WP2. This will streamline the heterogeneous data flow from sensors to archives and the automated data quality control and assurance (Task 2.2). It also involves the linking of effective visualization and interaction methods for data exploration approaches (Task 2.3), and tailoring machine learning methods to user specific requirements (Task 2.4). The methods and workflows will be developed according to the principles of agile software development (Shore and Warden 2008) where requirements and solutions evolve through the collaborative effort of interdisciplinary teams. It advocates adaptive planning, evolutionary development, early delivery, and continuous improvement. This will happen throughout the whole project phase and later in PoF IV through the established Seed Groups (see WP3). The workflows will be implemented within a Helmholtz Earth and Environment wide computational infrastructure that is thought of in Task 3.4 "Software Architecture-Concept" and that for the time of the project will be implemented through the work done in Task 2.2 to 2.4.

Example of an interactive visual exploration tool to support scientists validating simulation models (from Unger et al 2012). Various linked views to the data (K1-K4) allow e.g. to exploring goodness of fit of observation and simulation data in space (K1) and interactively exploration of observation and simulation data with respect to different model input parameter values (K2). Others provide frequency distribution (K3a) or visual comparison of observation and simulation data by optionally overlaying both data (K4).

WP2 Tasks

Task 2.1 Discus workflows, practices, demands and solutions
[GFZ, AWI, FZJ, GEOMAR, HMGU, HZG, KIT, UFZ]
Task 2.2 Data provision and quality
[AWI, FZJ, GEOMAR, GFZ, HMGU, HZG, KIT, UFZ]
Task 2.3 Visual data exploration
[GFZ, FZJ, GEOMAR, HMGU, HZG, UFZ]
Task 2.4 Computational/machine learning data exploration
[HMGU, AWI, FZJ, GEOMAR, GFZ, HZG, UFZ]