View Item

Domain Databases-Web-Information Retrieval-Reasoning
Domain - extra
Year 2014
Starting october 2014
Status Closed
Subject Data Fusion for Data Quality Assessment
Thesis advisor SAIS Fatiha
Co-advisors with Rallou Thomopoulos

- Fatiha.Sais at
- Rallou.Thomopoulos at

Laboratory LRI LaHDAK
Collaborations - Laboratoire de Recherche en Informatique (LRI), Paris Sud University
- IATE Joint Research Unit (Ingénierie des Agropolymères et Technologies Emergentes), Montpellier
- INRIA Project Team « GraphIK », Montpellier
- Institut National de l’Audiovisuel (INA), Paris
- Agence Bibliographique de l’Enseignement Supérieur (ABES), Montpellier

Abstract The main difficulty of data fusion concerns the conflicts in property values, that is, several possible values for the same property. These conflicts are mainly due to the heterogeneity of the data where different vocabularies and conventions are used to describe the data. Poor data quality (freshness, errors and incomplete information) may contribute to the amplification of conflicts between values.
This aim of this PhD project is to develop a data fusion approach where: (i) it is possible to choose and combine several criteria (e.g, freshness, frequency, reliability of sources, functional and semantic dependencies) to choose the right values (ii) the schema constraints should be checked in the merged data and (iii) provenance information on the original data sources but also mappings that are applied during schema integration of the data sources should be exploited. In addition, the richness of Web of Data will be used by navigating the graph of owl: sameAs.
Context Recently, with the initiative of « Linked Open Data cloud (LOD) », the number of sources of structured data made available on the Web has lead to an explosive growth of the global data space with billions of assertions (61 billions in January 2014). In this data space, semantic links can be established between data. These links allow crawlers, browsers or applications to navigate through the data sources and combine information from different sources. However, in an open environment like the Web, different URIs are regularly created to identify the same object. Generating identity links (owl:sameAs) between resources is crucial to allow applications to exploit the richness of the LOD. To thereby, these application should be able to fuse the resources linked using owl:sameAs links in order to obtain a unified representation. This is the data fusion problem arising after data linking problem. It is this data fusion problem that interests us in this PhD project.

Objectives Study the theoretical and practical aspects of data fusion approach in order to: (i) avoid data redundancy and (ii) summarize information from a point of view.
Work program A first work direction concerns the annotation of linked data. Indeed, once the linking achieved, an essential element for data quality is the interpretability and the explanation of data fusion result. It is important to be able to identify and keep track of the criteria that are used to choose the "best value" or to rank the possible values of a property during the fusion step. It is possible to achieve such a complex annotation for storing: excluded values, synonymy relations, specialization/generalization relations, mereology relations, and so on. For each criteria, a quality score is computed. These scores should also be kept and represented in the annotations.
2. A second direction of work concerns the case where several data can be grouped into the goal of creating a more generic entity with common characteristics. A case characteristic concerns the concept of "work" in the bibliographical data sources. A fusion approach within a data summary objective needs to be developed
Extra information We are looking for PhD candidates and for funding for this PhD project.
Prerequisite Good skills in databases, Web technologies (XML, RDF, OWL) and in knowledge representation.
Details Download sujetThese_fusion_2014-eng.pdf
Expected funding Institutional funding
Status of funding Expected
user fatiha.sais
Created Friday 13 of June, 2014 18:12:00 CEST
LastModif Friday 27 of June, 2014 16:44:08 CEST
Attachments (1)


DownloadsujetThese_fusion_2014-eng.pdf13 Jun 2014 18:1289763.98 Kb

The original document is available at