Quality Assurance of RDB2RDF Mappings

Westphal, Patrick
Abstract in English: 
Today, the Web of Data evolved to a semantic information network containing large amounts of data. Since such data may stem from different sources, ranging from automatic extraction processes to extensively curated knowledge bases, its quality also varies. Thus, currently research efforts are made to find methodologies and approaches to measure the data quality in the Web of Data. Besides the option to consider the actual data in a quality assessment, taking the process of data generation into account is another possibility, especially for extracted data. An extraction approach that gained popularity in the last years is the mapping of relational databases to RDF (RDB2RDF). By providing definitions of how RDF should be generated from relational database content, huge amounts of data can be extracted automatically. Unfortunately, this also means that single errors in the mapping definitions can affect a considerable portion of the generated data. Thus, from a quality assurance point of view, the assessment of these RDB2RDF mapping definitions is important to guarantee high quality RDF data. This is not covered by recent quality research attempts in depth and is examined in this thesis. After a structured evaluation of existing approaches, a quality assessment methodology and quality dimensions of importance for RDB2RDF mappings are proposed. The formalization of this methodology is used to define 43 metrics to characterize the quality of an RDB2RDF mapping project. These metrics are also implemented for a software prototype of the proposed methodology, which is used in a practical evaluation of three different datasets that are generated applying the RDB2RDF approach.
Pages / Seitenanzahl: 
thesis_westphal.pdf2.8 MB