NSF* Award: III: Small: Enhancing Ontology Matching with Visual Analytics

PI: Maria Isabel Cruz

Award Number: 1618126

Abstract

An ontology is a representation of a domain, be it biomedical, business, environmental, or others. In a world where data are predominantly heterogeneous, ontology matching establishes correspondences between the concepts of two ontologies, thus effectively bridging across two distinct domains or two different representations of the same domain. Ontology matching is therefore a fundamental tool for data integration, that is, for the creation of a homogeneous gateway to disparate data. Ontology matching systems are made up of several matching algorithms, called matchers. Different matching tasks require different matchers, thus there are various configuration and parameter choices to be made, resulting in a multi-dimensional problem whose tuning requires considerable effort and expertise. However, most ontology matching systems operate as a black box offering no insight as to how the output---a set of mappings among ontology concepts, called alignment---is generated. These systems do not usually offer the opportunity to the domain experts to validate automatically generated mappings, so as to gain control over the matching process. In this project, we use visual analytics, a combination of visualization and analytics to facilitate ontology matching. Users interact with a visual representation of the matching process and validate mappings that are ranked by underlying analytical methods. This award investigates visual analytics methods and studies their potential benefits. Our collaboration with partners in the biological domain will ensure the practical relevance of our research. From an educational viewpoint, the PI is spearheading a new Data Science curriculum in Computer Science, which can incorporate the main aspects of the proposed research and will train a graduate student and postdoc in this multidisciplinary field.

Driven by data integration needs in a wide range of domains, the field of ontology matching has been prospering. However, the use of visual analytics remains largely unexplored. The proposed research will combine the power of visual analytics with ontology matching to: (1) open up the ontology matching process so as to facilitate its configuration by domain experts; (2) reduce the number of mappings to be validated by the experts so as to achieve high quality results with minimum effort; and (3) investigate a methodology to evaluate the benefits of combining ontology matching with visual analytics. For the visualization design, a principled approach will be followed that provides prescriptive guidance for determining appropriate evaluation approaches, while for the manipulation of visualized data a taxonomy of interactive operations will be used. Further, the design and analysis of the workflow that describes the interactive nature of the overall process will facilitate the study of the complex interdependencies between the data manipulation and the visualization components. The web site of this project is available at https://www.cs.uic.edu/Cruz/OntologyMatchingVisualAnalytics.

First Year Report (2016-2017)
Major goals

The proposed research will combine the power of visual analytics with ontology matching to: (1) open up the ontology matching process so as to facilitate its configuration by domain experts; (2) reduce the number of mappings to be validated by the experts so as to achieve high quality results with minimum effort; and (3) investigate a methodology to evaluate the benefits of combining ontology matching with visual analytics.

Accomplishments

Ontology matching often uses external knowledge bases to establish the connection among concepts in different ontologies. We have investigated a distantly supervised approach to derive spatio-temporal relationships from text documents, which enrich knowledge bases with dynamic facts. Specifically, we have analyzed corpora of noun phrases, appositions, and adjectives to build templates characterizing geospatial and temporal data. We have successfully evaluated the effectiveness of our approach using both automated and manual methods on the YAGO knowledge base.

We have extended the ontology matching system AgreementMakerLight (in collaboration with researchers at the U. of Lisbon and IGC) to incorporate more scalable matching methods. The quality of our matching results have been recognized by our placement at the top of the more than 20 systems that competed in the 2016 Ontology Alignment Evaluation Initiative (OAEI), receiving the Pistoia Alliance first prize for the Disease and Phenotype track of the OAEI. We have disseminated our work by contributing open source code to GitHub. AgreementMakerLight has been used by experts to match ontologies describing biological information networks. They have been using a visual interface for the AgreementMakerLight system to explore in detail a subset of the mappings to determine their correctness. The experts have identified some of the visual operations they perform and their correspondence to their verification methods.

We have used Association Rule Mining and ontologies to extract patterns from repositories of crime data. Our technique has been incorporated into a system that displays query results and supports their analysis via a geospatial user interface. To analyze a Chicago crime dataset, we have built a crime ontology by matching crime classification schemes from the FBI and the Chicago Police Department. Our experiments show that we can significantly reduce the number of rules without sacrificing query precision.

We have investigated two formal frameworks to describe the requirements of a system that performs ontology matching extended with visual interaction, namely the Nested Model and the NOVIS model.

Products
  • Balasubramani, Booma Sowkarthiga ; Shivaprabhu, Vivek R. ; Krishnamurthy, Smitha ; Cruz, Isabel F. ; Malik, Tanu (2016). Ontology-based urban data exploration. 4th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. . Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1145/3007540.3007550

  • Cheatham, Michelle; Cruz, I. F. ; Euzenat, Jérôme ; Pesquita, Cátia (2017). Special issue on ontology and linked data matching.. Semantic Web. 8 (2), . Status = PUBLISHED; Acknowledgment of Federal Support = No ; DOI: 10.3233/SW-160251

  • Mirrezaei, Seyed Iman ; Martins, Bruno ; Cruz, Isabel F. (2016). A distantly supervised method for extracting spatio-temporal information from text. 4th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. . Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1145/2996913.2996967

  • Faria, Daniel ; Pesquita, Catia ; Balasubramani, Booma S. ; Martins, Catarina ; Cardoso, João ; Curado, Hugo ; Couto, Francisco M. ; Cruz, Isabel F. (2016). Ontology-based urban data exploration. 11th International Workshop on Ontology Matching co-located with the 15th International Semantic Web Conference. . Status = PUBLISHED; Acknowledgement of Federal Support = Yes
Students

Vivek Shivaprabhu

Plans for the following year

We plan to continue the identification of the matching and visual operations using the Nested Model.

We also plan to continue extending the AgreementMakerLight system to incorporate our research findings.

Regarding broader impacts, we plan to give assignments in an introductory Data Science course that address data integration and visual methods for data analysis.

Second Year Report (2017-2018)
Major activities

(1) Ontology Matching Process:

Develop efficient and scalable algorithms for ontology and instance matching.

(2) Mapping Validation:

Investigate applications that are semi-automatic, that is, that require the intervention of domain experts, focusing on mechanisms that reduce the number of mappings to be presented to the users for validation.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

Create a framework that allows for the visual exploration and comparison of the outcomes of different matchers or combination of matchers.

Specific objectives

In the three major research directions we aimed at:

(1) Ontology Matching Process:

Successfully test our matching algorithms as developed in AgreementMakerLight (AML) with other state of the art approaches in the Ontology Matching Evaluation Initiative (OAEI). Enter for the first time with AML the Instance Matching OAEI track.

(2) Mapping Validation:

Develop algorithms that rank the mappings so as to facilitate the userís choice of the mappings to validate. The idea being that users will validate first the highest ranked mappings, thus in practice reducing the number of mappings to validate.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

(a) Characterize ontology matching tasks from atomic to composite as the evaluation must be based on these tasks.

(b) Compare different visualizations in terms of number of steps required to perform those tasks.

c) Develop analytics for reorganizing the visualizations to facilitate the interpretation of the images.

Significant results

(1) Ontology Matching Process: At the OAEI, AML was the only system (among 21) to participate in all the tracks and the best system overall, obtaining first place in the Anatomy track, Conference track, Multifarm track, Interactive Matching track, Anatomy track, Large Biomedical Ontologies track, and in the three Instance Matching Tracks.

(2) Mapping Validation: We developed and evaluated a mechanism for matching an unnamed property with a ranked list of knowledge base predicates, by looking at a set of homogeneous property values. Our ranking algorithm is able to effectively quantify semantic similarity.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

We expect to have made good progress on the three components of this topic by end of the summer.

Key outcomes or other achievements

(1) Ontology Matching Process: AgreementMakerLight won the 2017 IBM Research prize winner for Instance Matching. In a paper published in the J. of Biomedical Semantics, we dissect the strategies employed by matching systems that tackle the most difficult challenges such as those for biomedical ontologies using the AgreementMakerLight system as the platform for this study.

(2) Mapping Validation:: The paper on ranking knowledge base predicates that match an unnamed property was published at the 2018 WWW conference, which had an acceptance rate of less than 15%.

Products
  • Faria, D., Pesquita, C., Mott, I., Martins, C., Couto, F.M. and Cruz, I.F. (2018). Tackling the challenges of matching biomedical ontologies. J. Biomedical Semantics. 9 (1), 4:1. Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1186/s13326-017-0170-9

  • Porrini, Riccardo and Palmonari, Matteo and Cruz, Isabel F. 2018 World Wide Web Conference. 1215 to 1224. Status = Deposited in NSF-PAR doi:10.1145/3178876.3186020 (2018). Facet Annotation Using Reference Knowledge Bases.. World Wide Web Conference. 1215. Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1145/3178876.3186020

  • Faria, D., Balasubramani, B. S., Shivaprabhu, V. R., Mott, I., Pesquita, C., Couto, F. M., Cruz, I. F. (2017). (2017). Results of AML in OAEI 2017.. 12th International Workshop on Ontology Matching co-located with the 16th International Semantic Web Conference. . Status = PUBLISHED; Acknowledgement of Federal Support = Yes
Students

Vivek Shivaprabhu
Zhu (Ellen) Wang
Jenny Vuong (undergrad)

Plans for the following year

Our work will continue focusing on the three main goals of this project. For example, we would like to advance on the characterization of ontology matching tasks, the comparison of visualizations, and analytics that operate directly on the visualization.

We plan on releasing our software on GitHub.

We will apply for an REU to fund one or two undergraduate students.

We will try to participate again on “Girls Who Code” especially if offered at UIC. This program aims to introduce K-12 girls and young women to Computer Science.

*Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Topic revision: r7 - 2018-07-29 - 23:32:41 - Main.ifcruz
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF