The data integration process takes two datasets and combines them into a unified data-set by performing five composable tasks.
Schema matching (1) aligns the schemas of the two datasets.
Schema mapping (2) performs any transformations required by the different semantic of the aligned fields.
Entity resolution (3) identifies duplicate records.
Entity consolidation (4) merges them.
Data cleansing (5) can be applied at any point to detect and correct errors.
Schema Matching System
A schema matching process receives two or more datasets and outputs a set of correspondences between the datasets ’schemas’ attributes. Schema matching and the related field of ontology alignment have been studied extensively with research on building matching systems and on adapting and combining different matching methods to the task at hand.
First line matchers, also known as matching algorithms, similarity measures, and base learners, utilize information contained in the schemas being matched or in the associated data instances, if these are available, to propose correspondences between schema attributes.
Second line matchers are thus named as they operate upon the result of one or more first line matchers, namely a set of similarity matrices, and perform functions such as filtering, selection and aggregation of results.