Linked Data Quality Assessment and its Application to Societal Progress Measurement
Abstract: In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. In this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. Next, three different methodologies for linked data quality assessment are evaluated namely (i) user-driven; (ii) crowdsourcing and (iii) semi-atuomated use case driven. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis.
Thursday, 9 April at 2pm, Room P702