

AAL+Healthcare Data: System Architecture and Methodology
Following a typical Extract, Transform, Load (ETL) data processing workflow, we here provide an overview of the functionalities proposed as part of the project, that is responsible for the data acquisition, processing and maintenance. On top, various applications are envisioned that each utilize the ingested data, in order to facilitate their specific service.
The proposed data architecture governs the overall data mobility and integrity to ensure the data quality, in terms of relevance and freshness as well as to support the end-user-application and forward data sharing. In short, on one side, the data architecture focuses on the modelling and maintenance of external data sources, i.e., the acquisition of the data through various connectors targeting the various data source, and their maintenance in particular. While on the other side, it focuses on the aggregation and analysis of the data across the various dataset, to improve the data quality as a whole as well as to create information out of the data.
From an application point of view, two main aspects have been considered. For selected datasets, source-specific applications are developed to provide value-adding services on top of the given dataset that is not provided by the original service. This may include further indexing the datasets, as they are, to facilitate e.g., search functionality, monitor the data integrity of the local data set and its state through its presentation and visualization of the information. This separation also improves the data quality as the collection processes attempt, as far as possible, to manipulate the external datasets as little as possible, and rather focus on having a reliable and useful reflection of the data source. The second aspect, as mentioned above, is focused on application based on the analysis and exploitation of the datasets as seen from a horizontal integration point of view where the key entities across the various datasets have been fused, to enrich the overall view, or based on the outcomes of the various analysis that may be performed as to extract insights from the data that has been collected.
Depending on the type of the remote data source, e.g., whether it’s an API, file-based source, or remote database, different connector implementations are applied, each tailored to the specific data provisioning type and source. Besides preparing the staging of the information as close to the original content as possible, a key design aspect is to optimize the collection and/or access process and constrain it towards reducing the strains on the remote resources when synchronizing.
For the integration with the Data Market Austria data market, three main facilitators are envisioned, namely the facilitation of the aggregated raw data, e.g., this is relevant for the API-based open data sets where the collection process has aggregated the information across the provided API collection, and thus the value-added service is a dataset that can be presented as a single entity. As these are evolving data sets with new entities being added over time, the dynamic data inventory provided by the DMA would be a suitable solution. Secondly, and primarily, the main value would come from the aggregated datasets, in their own right, where the data would be enriched, or, by facilitating the insights from the data analytics.
Keywords
AAL, healthcare data, ETL, architecture