Ontologies and their Use in Data Integration

Ontologies

Ontologies provide a conceptualization of the domain (or domains) described by the knowledge graph, adding entailment mechanisms such as the ability to group entities into a class, create same-as links between entities, equivalence relationships between classes, and denote predicates as sub-properties.

All ontologies use some form of vocabularies in order to express terms and specify their meanings. Similarly to taxonomies, they adopt a classification structure. However, ontologies add properties for each class and a set of axioms and rules that allow reasoning and full domain conceptualization.

Ontology-Based Data Integration and Access

Taking advantage of the AI knowledge representation and inference mechanisms, Ontology-Based Data Integration (OBDI) uses ontologies to consolidate several heterogeneous sources into one source.

In many cases existing data sources are non-ontology, rendering OBDI impossible. Ontology-Based Data Access (OBDA) is an alternative model that provides access to the data layer through a declarative mapping between autonomous data layers and domain-specified ontology. A typical development process of an OBDA system for a project that has a SQL database will contain the following steps.

(a) Create an ontology of domain-specific user knowledge.

(b) Write mapping that connects (usually through SQL queries) the ontology to the project’s database.

(c) Write a query using ontology’s vocabulary as a semantic query language query, such as SPARQL.

(d) Build an OBDA system framework that automatically rewrites the SPARQL query to a SQL query over the project’s database.

When searching for relevant research, users use search tools provided by the data sources. These can be classified into one of three types of interfaces. Key word queries comprise a sequence of terms of which at least one should be present in the dataset for it to be returned in the results. OBDA allows the use of Ontological queries that rely on well-defined ontological terms such as organism species or molecular compounds, which the user specifies together with logical constraints and entailment allowances to form a logical statement. Each candidate result must satisfy the logical statement to be returned.