NSF * Award: III: Small: Enhancing Ontology Matching with Visual Analytics

PI: Maria Isabel Cruz

Award Number: 1618126

Abstract

An ontology is a representation of a domain, be it biomedical, business, environmental, or others. In a world where data are predominantly heterogeneous, ontology matching establishes correspondences between the concepts of two ontologies, thus effectively bridging across two distinct domains or two different representations of the same domain. Ontology matching is therefore a fundamental tool for data integration, that is, for the creation of a homogeneous gateway to disparate data. Ontology matching systems are made up of several matching algorithms, called matchers. Different matching tasks require different matchers, thus there are various configuration and parameter choices to be made, resulting in a multi-dimensional problem whose tuning requires considerable effort and expertise. However, most ontology matching systems operate as a black box offering no insight as to how the output---a set of mappings among ontology concepts, called alignment---is generated. These systems do not usually offer the opportunity to the domain experts to validate automatically generated mappings, so as to gain control over the matching process. In this project, we use visual analytics, a combination of visualization and analytics to facilitate ontology matching. Users interact with a visual representation of the matching process and validate mappings that are ranked by underlying analytical methods. This award investigates visual analytics methods and studies their potential benefits. Our collaboration with partners in the biological domain will ensure the practical relevance of our research. From an educational viewpoint, the PI is spearheading a new Data Science curriculum in Computer Science, which can incorporate the main aspects of the proposed research and will train a graduate student and postdoc in this multidisciplinary field.

Driven by data integration needs in a wide range of domains, the field of ontology matching has been prospering. However, the use of visual analytics remains largely unexplored. The proposed research will combine the power of visual analytics with ontology matching to: (1) open up the ontology matching process so as to facilitate its configuration by domain experts; (2) reduce the number of mappings to be validated by the experts so as to achieve high quality results with minimum effort; and (3) investigate a methodology to evaluate the benefits of combining ontology matching with visual analytics. For the visualization design, a principled approach will be followed that provides prescriptive guidance for determining appropriate evaluation approaches, while for the manipulation of visualized data a taxonomy of interactive operations will be used. Further, the design and analysis of the workflow that describes the interactive nature of the overall process will facilitate the study of the complex interdependencies between the data manipulation and the visualization components. The web site of this project is available at https://www.cs.uic.edu/Cruz/OntologyMatchingVisualAnalytics.

First Year Report (2016-2017)
Major goals

The proposed research will combine the power of visual analytics with ontology matching to: (1) open up the ontology matching process so as to facilitate its configuration by domain experts; (2) reduce the number of mappings to be validated by the experts so as to achieve high quality results with minimum effort; and (3) investigate a methodology to evaluate the benefits of combining ontology matching with visual analytics.

Accomplishments

Ontology matching often uses external knowledge bases to establish the connection among concepts in different ontologies. We have investigated a distantly supervised approach to derive spatio-temporal relationships from text documents, which enrich knowledge bases with dynamic facts. Specifically, we have analyzed corpora of noun phrases, appositions, and adjectives to build templates characterizing geospatial and temporal data. We have successfully evaluated the effectiveness of our approach using both automated and manual methods on the YAGO knowledge base.

We have extended the ontology matching system AgreementMakerLight (in collaboration with researchers at the U. of Lisbon and IGC) to incorporate more scalable matching methods. The quality of our matching results have been recognized by our placement at the top of the more than 20 systems that competed in the 2016 Ontology Alignment Evaluation Initiative (OAEI), receiving the Pistoia Alliance first prize for the Disease and Phenotype track of the OAEI. We have disseminated our work by contributing open source code to GitHub. AgreementMakerLight has been used by experts to match ontologies describing biological information networks. They have been using a visual interface for the AgreementMakerLight system to explore in detail a subset of the mappings to determine their correctness. The experts have identified some of the visual operations they perform and their correspondence to their verification methods.

We have used Association Rule Mining and ontologies to extract patterns from repositories of crime data. Our technique has been incorporated into a system that displays query results and supports their analysis via a geospatial user interface. To analyze a Chicago crime dataset, we have built a crime ontology by matching crime classification schemes from the FBI and the Chicago Police Department. Our experiments show that we can significantly reduce the number of rules without sacrificing query precision.

We have investigated two formal frameworks to describe the requirements of a system that performs ontology matching extended with visual interaction, namely the Nested Model and the NOVIS model.

Products
  • Balasubramani, Booma Sowkarthiga ; Shivaprabhu, Vivek R. ; Krishnamurthy, Smitha ; Cruz, Isabel F. ; Malik, Tanu (2016). Ontology-based urban data exploration. 4th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. . Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1145/3007540.3007550

  • Cheatham, Michelle; Cruz, I. F. ; Euzenat, Jerome ; Pesquita, Catia (2017). Special issue on ontology and linked data matching.. Semantic Web. 8 (2), . Status = PUBLISHED; Acknowledgment of Federal Support = No ; DOI: 10.3233/SW-160251

  • Mirrezaei, Seyed Iman ; Martins, Bruno ; Cruz, Isabel F. (2016). A distantly supervised method for extracting spatio-temporal information from text. 4th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. . Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1145/2996913.2996967

  • Faria, Daniel ; Pesquita, Catia ; Balasubramani, Booma S. ; Martins, Catarina ; Cardoso, Joao ; Curado, Hugo ; Couto, Francisco M. ; Cruz, Isabel F. (2016). Ontology-based urban data exploration. 11th International Workshop on Ontology Matching co-located with the 15th International Semantic Web Conference. . Status = PUBLISHED; Acknowledgement of Federal Support = Yes
Students

Vivek Shivaprabhu

Plans for the following year

We plan to continue the identification of the matching and visual operations using the Nested Model.

We also plan to continue extending the AgreementMakerLight system to incorporate our research findings.

Regarding broader impacts, we plan to give assignments in an introductory Data Science course that address data integration and visual methods for data analysis.

Second Year Report (2017-2018)
Major activities

(1) Ontology Matching Process:

Develop efficient and scalable algorithms for ontology and instance matching.

(2) Mapping Validation:

Investigate applications that are semi-automatic, that is, that require the intervention of domain experts, focusing on mechanisms that reduce the number of mappings to be presented to the users for validation.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

Create a framework that allows for the visual exploration and comparison of the outcomes of different matchers or combination of matchers.

Specific objectives

In the three major research directions we aimed at:

(1) Ontology Matching Process:

Successfully test our matching algorithms as developed in AgreementMakerLight (AML) with other state of the art approaches in the Ontology Matching Evaluation Initiative (OAEI). Enter for the first time with AML the Instance Matching OAEI track.

(2) Mapping Validation:

Develop algorithms that rank the mappings so as to facilitate the users choice of the mappings to validate. The idea being that users will validate first the highest ranked mappings, thus in practice reducing the number of mappings to validate.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

(a) Characterize ontology matching tasks from atomic to composite as the evaluation must be based on these tasks.

(b) Compare different visualizations in terms of number of steps required to perform those tasks.

c) Develop analytics for reorganizing the visualizations to facilitate the interpretation of the images.

Significant results

(1) Ontology Matching Process: At the OAEI, AML was the only system (among 21) to participate in all the tracks and the best system overall, obtaining first place in the Anatomy track, Conference track, Multifarm track, Interactive Matching track, Anatomy track, Large Biomedical Ontologies track, and in the three Instance Matching Tracks.

(2) Mapping Validation: We developed and evaluated a mechanism for matching an unnamed property with a ranked list of knowledge base predicates, by looking at a set of homogeneous property values. Our ranking algorithm is able to effectively quantify semantic similarity.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

We expect to have made good progress on the three components of this topic by end of the summer.

Key outcomes or other achievements

(1) Ontology Matching Process: AgreementMakerLight won the 2017 IBM Research prize winner for Instance Matching. In a paper published in the J. of Biomedical Semantics, we dissect the strategies employed by matching systems that tackle the most difficult challenges such as those for biomedical ontologies using the AgreementMakerLight system as the platform for this study.

(2) Mapping Validation:: The paper on ranking knowledge base predicates that match an unnamed property was published at the 2018 WWW conference, which had an acceptance rate of less than 15%.

Products
  • Faria, D., Pesquita, C., Mott, I., Martins, C., Couto, F.M. and Cruz, I.F. (2018). Tackling the challenges of matching biomedical ontologies. J. Biomedical Semantics. 9 (1), 4:1. Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1186/s13326-017-0170-9

  • Porrini, Riccardo and Palmonari, Matteo and Cruz, Isabel F. 2018 World Wide Web Conference. 1215 to 1224. Status = Deposited in NSF-PAR doi:10.1145/3178876.3186020 (2018). Facet Annotation Using Reference Knowledge Bases.. World Wide Web Conference. 1215. Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; DOI: 10.1145/3178876.3186020

  • Faria, D., Balasubramani, B. S., Shivaprabhu, V. R., Mott, I., Pesquita, C., Couto, F. M., Cruz, I. F. (2017). (2017). Results of AML in OAEI 2017.. 12th International Workshop on Ontology Matching co-located with the 16th International Semantic Web Conference. . Status = PUBLISHED; Acknowledgement of Federal Support = Yes
Students

Vivek Shivaprabhu
Zhu (Ellen) Wang
Jenny Vuong (undergrad)

Plans for the following year

Our work will continue focusing on the three main goals of this project. For example, we would like to advance on the characterization of ontology matching tasks, the comparison of visualizations, and analytics that operate directly on the visualization.

We plan on releasing our software on GitHub.

We will apply for an REU to fund one or two undergraduate students.

We will try to participate again on Girls Who Code especially if offered at UIC. This program aims to introduce K-12 girls and young women to Computer Science.

Third Year Report (2018-2019)
Major activities

(1) Ontology Matching Process:

Develop efficient and scalable algorithms for ontology and instance matching.

(2) Mapping Validation:

Investigate applications that are semi-automatic, that is, that require the intervention of domain experts, focusing on mechanisms that reduce the number of mappings to be presented to the users for validation.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

Create a framework that allows for the visual exploration and comparison of the outcomes of different matchers or combination of matchers.

Specific objectives

In the three major research directions we aimed at:

(1) Ontology Matching Process:

Successfully test our matching algorithms as developed in AgreementMakerLight (AML) with other state of the art approaches in the Ontology Matching Evaluation Initiative (OAEI). Enter for the first time with AML the Instance Matching OAEI track.

(2) Mapping Validation:

Develop algorithms that rank the mappings so as to facilitate the users choice of the mappings to validate. The idea being that users will validate first the highest ranked mappings, thus in practice reducing the number of mappings to validate.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

(a) Characterize ontology matching tasks from atomic to composite as the evaluation must be based on these tasks.

(b) Compare different visualizations in terms of number of steps required to perform those tasks.

(c) Develop analytics for reorganizing the visualizations to facilitate the interpretation of the images.

Significant results

We participated in the Ontology Alignment Evaluation Initiative (OAEI) in 2018 (and 2019) where the best matching ontology systems compete. AgreementMakerLight was the only system in 18 participating systems that competed in all the tracks, and was the only one that competed in the Complex Matchings Track. Overall, AgreementMakerLight was the top performing system, a place it has occupied in the last several years, even as new complex tracks are introduced, and when all of our code is open source.

Hence, we can rely on our system when performing analytics and visual analytics instead of working (and extending) a system authored by others, whose code may or may not be available. The OAEI Interactive Track is particularly important as it simulates interaction with users in a semi-automatic system. The extensive evaluation performed by the Track organizers shows that AgreementMakerLight is not only the best performer, but also the most impervious to simulated user inputs whose error rates are 10%, 20%, and 30%

Likewise we can rely on AgreementMaker for medium to large ontologies. AgreementMaker has a comprehensive visual user interface that drives the whole matching and evaluation process.

Considering AgreementMaker(Light) over the years, a cumulative significant result is that AgreementMaker(Light) is the only system that was reported in 2011 in the authoritative survey by P. Bernstein, J. Madhavan, and E. Rahm, "Generic Schema Matching, Ten Years Later," PVLDB, which is still competing in the OAEI.



Key outcomes or other achievements

Considering AgreementMaker(Light) over the years, a cumulative significant outcome is that AgreementMaker(Light) is the only system that was reported in 2011 in the authoritative survey by P. Bernstein, J. Madhavan, and E. Rahm, "Generic Schema Matching, Ten Years Later," PVLDB, which is still competing in the OAEI. We believe this is a significant achievement, especially considering the top results we have been attaining in the OAEI.

Products
  • Balasubramani, Booma Sowkarthiga and Cruz, Isabel F. (2019). Spatial Data Integration. Encyclopedia of Big Data Technologies Sakr, Sherif and Zomaya, Albert. Springer. Cham.

  • Faria, Daniel and Pesquita, Catia and Balasubramani, Booma and Tervo, Teemu and Carrico, David and Garrilha, Rodrigo and Couto, Francisco and Cruz, Isabel F. (2018). Results of AML participation in OAEI 2018.. Proceedings of the 13th International Workshop on Ontology Matching co-located with the 17th International Semantic Web Conference.. 2288 125. Status = PUBLISHED; Acknowledgment of Federal Support = Yes ; Peer Reviewed = Yes ; OTHER: http://ceur-ws.org/Vol-2288/oaei18_paper2.pdf

  • https://www.cs.uic.edu/Cruz/OntologyMatchingVisualAnalytics

    This web site contains a short description of the grant, publications (including DOI for easy access), and a summary of the annual reports.


Students

Booma Balasubramani
Zhu (Ellen) Wang
Noemi Andras
Mohit Aggarwal


Plans for the following year

We will continue the work that we have been developing this past year, as follows:

(1) Ontology Matching Process:

Continue the development of AgreementMakerLight so as to compete again in the Ontology Matching Evaluation Initiative (OAEI), which continues to expand in terms of its tracks and their complexity.

(2) Mapping Validation:

It is fundamental to develop methods that rank the mappings in terms of their quality so as to facilitate the users choice of the mappings to validate. The idea being that users will validate first those mappings of lesser quality, thus in practice reducing the number of mappings to validate, while increasing the success metrics (e.g., F-measure) at a faster pace. A central aspect of this work is the continuation of the development of analytical methods that assign a measure of intrinsic quality to the mappings in an alignment.

(3) Evaluation of Combining Ontology Matching with Visual Analytics:

We would like to continue our work on the characterization of ontology matching tasks, the comparison of the effectiveness of the visualizations to perform those tasks, and on analytical methods that operate directly on the visualization. Further, as Elixir (European Union project) matures and their ontology matchings are released, we plan to use them in our work, and to deliver to the domain experts our implementation.

*Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


This topic: Cruz > WebHome > OntologyMatchingVisualAnalytics
Topic revision: r8 - 2020-01-20 - 05:36:00 - Main.ifcruz
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF