

SmartDataLake: Pilot C overview, main components and data
In order to provide a balanced focus on both research and innovation in the SmartDataLake project, the project included three large scale piloting use-cases, executed by the three industry partners of the projects. These pilots focused on providing three different perspectives and application of the results of the SmartDataLake project. The article below describes the third pilot.
Throughout the third pilot use-case of the SmartDataLake project, SYNYO applied selected components from the SmartDataLake toolbox to complement the end-to-end workflow of open data processing, analysis and converting open data into insights and actionable information. This was performed primarily on datasets surrounding the European research landscape, e.g., projects performed under the H2020 programme, and the organisation participating in executing them, with a focus of applying key SmartDataLake functionalities:
- Flexibly and interactively working with the data using RAW.
- Enriching the core dataset by joining it with other independent datasets through entity resolution.
- Analysing and evaluating the data with support from the SciNeM, Loci and SimSearch components.
- Integrating the gained insights into an end-user targeting solution.
While the first two action points made working with the data easier and more flexible, e.g., by providing a flexible SQL like interface to raw data, the analysis made it possible evaluate the relevance of the data, by e.g.,
- Ranking of individual entities based on their interconnection to identify dominant one.
- Identifying communities of entities belonging together.
- Comparing entities based on their similarity.
- Applying the above in a geographical context.
The main insights gained from the analysis were then selected and, either by persisting the gained information or by directly integrating the components and their configuration, integrated into an end-user targeting application that enables a user to navigate the European research landscape, by leveraging their own knowledge as a starting point. Specifically, this includes, reflecting the above capabilities:
- Ranking of organisations, based on their activity in the European research landscape.
- Identifying communities of organisations based on how frequently the collaborate.
- Identifying similar organisation, projects and topics to identify future collaborations.
Keywords
Big data, data lake, smart, pilot, research landscape