In the Merge phase, candidate datasets are harmonized semantically, computationally, and geographically to form one large and coherent data-set. The merge phase covers the following steps: Matching, Mapping, Fusing.
An Ontology-based Data Integration Tool
Once a collection of datasets has been assembled, the merge phase can commence. To facilitate this process, one must create a mediated schema to which all other datasets are matched. The ODINI project is creating a unique Data Integration Ontology to facilitate automated mapping of the datasets to the mediated schema. We are also developing a data integration software tool to enable oceanographic researchers to use this ontology to integrate large numbers of datasets. Stay tuned for updates.
In the match step, researchers align the different attributes/parameters in the dataset’s schema with the mediated schema/ontology. To do so, the researcher must often consult the data descriptions of each parameter, which are either listed with the data-set in the source repository or described as part of the methods section of the accompanying paper.
In some cases, the semantics of the data in one source are slightly different from that of the mediated schema/ontology.In such cases a mapping phase where conversion functions are generated to facilitate data integration according to correspondences found in the matching step. Even more mundane, but crucial is the need to map from the source format to that of the central repository used to collect the data from the different datasets.
In this step, researchers need to mitigate problems that emanate from differences in spatiotemporal resolution between the datasets. Thus, one data-set may include measurements of a 50-m depth in increments of 1 m, while another in increments of 10 cm. Decisions must be made on whether to aggregate upwards to lower resolutions, omit incompatible resolutions or interpolate the data to align the resolutions, or fill out missing data in some areas.