SmartDataLake

SmartDataLake: Validation of the system and individual components through long term piloting

Throughout the SmartDataLake project a strong focus has been on the combination of driving research results forward, while at the same time collecting industry-oriented feedback. In order to expand on this feedback and to proactively validate the solution, the project includes a ten months long piloting phase, during which all the individual components are applied to use-cases provided by the project’s industry partner.

Overview

In short, the pilot consists of the various components of the SmartDataLake project that are used by the three pilot partners; SpazioDati, SPRING TECHNO and SYNYO, to solve their data related challenges. In line with the rest of the project structure, the pilot focuses on the three main functionalities of the SmartDataLake project, namely; data management, data exploration and analysis and interactive visual analytics.

Overall, the main use-cases have been defined and detailed by specifying the challenges and the relevant key performance indicators as well as been mapped to the relevant SmartDataLake component that may be used to achieve these use-cases. The intermediate use-cases are as follows:

Assembling entity profiles – As even structured and properly annotated data may be distributed across different systems and instances, this functionality allows to operate on distributed data as if it was provided as a single instance of a database.
Matching company profiles across sources – When working on uncorrelated datasets, the challenge is to accurately combine these datasets into a single one, by matching the individual entities.
Computing descriptive analytics – helpful when working on a newly acquired data to gain insights into the data quality.
Finding similar entity profiles – Contrary to identifying exact matches, similar entities are helpful when identifying alternatives or just expanding the selection based on certain known parameters.
Predicting potential links – Similar to finding similar entities, but applies the relation of entities rather than the attributes of a given entity as the decisive metric.
Storage tiering for historical data – While not all data is always relevant, this use-case supports the autonomous decision-making for when what segment of data may be stored in cold storage.
Detection of similar and correlated time series – This use-case identifies similar trends in time series data.
Detection of seasonal patterns – Building on top of the previous use-case, the focus is to be able to add additional meta-information such as seasonality, when performing the analysis.
Detection of changes – As a lot of useful data is constantly evolving, monitoring the changes over time provides insights into which entities are being impacted.
Community detection and ranking of different types of entities – While investigating large data set, a helpful functionality is to be able to identify the most prominent entities, and the dominating communities. This is particular interesting when combined with change detection, in order to monitor how these evolve.
Visual analytics accompanying all the above use-cases to provide visual feedback during the analysis.

Throughout the pilots the project partners will be collaborating on executing and evaluating each of the above use-case and the components that support their implementation. The final outcomes, insights and results will be summarized and published in a project report by the end of the project, specifically in the beginning of January 2022, together with the revised repositories, where applicable, on GitHub, containing the developed components.

Links

https://github.com/smartdatalake

Keywords

Big data, data lake, smart, piloting, feedback

BOND: Outcomes in Advancing Education, Tolerance and Heritage Preservation to combat Antisemitism

January 1, 2025

OpenMusE: Live Music Census tunes into health of European music scene

FU-TOURISM: Acceleration Programme – 20.000 Euros for Innovative SMEs

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.