TWiki> Cruz Web>VisualAnalytics (revision 12)EditAttach

NSF Award: EAGER: Visual Analytics for Ontology Matching

PI: Maria Isabel Cruz

Award Number: 1143926

Abstract

Ontologies are developed to provide semantics for a particular domain and to support information retrieval, reasoning and knowledge discovery. However, as separate groups develop ontologies, there is a need to combine or match ontologies to support connecting information across heterogeneous sources. '''Ontology matching''' is a complicated process that stems from the need to involve several types of matching algorithms that take into account syntactic, lexical, structural, instance, and logic features of the ontologies. The current support provided to users to understand and evaluate the results provided by ontology matching systems is very limited; therefore ontology matching is an arduous and time consuming task. This exploratory project focuses on development of a novel approach to ontology matching that employs visual analytics to guide the users in the process. It is expected to result in increased quality of resulting ontologies while also reducing the time and effort of experts involved in ontology matching.

Visual analytics is at the confluence of information visualization, data analytics, and data transformation. This project explores the potential of visual analytics to effectively assist real-time decisions by domain experts and ontology researchers alike during the ontology matching process. The project is organized around three key research challenges:

(1) Visualization: Data and analytically extracted features need to be encoded into rich visualizations that can be effectively manipulated. In particular, visualizations should lend themselves well to complex transformations that facilitate the discernment of trends or patterns.

(2) Architecture: The interaction between the automatic matching and the visual analytics modules is central to the proposed approach. The envisioned architecture will support a quality-controlled feedback loop in which users will intervene to change the analytic and visual parameters of the system.

(3) Performance evaluation: Performance measures will be developed to objectively identify the obtained gains in terms of the effort saved by users and of the quality of the matching results as enabled by the proposed visual analytics approach to ontology matching.

If successful, this proof-of-concept project is expected to make a significant contribution in effective ontology matching that in turn will enable semantically enriched access to complex, heterogeneous, and distributed data to an increasing number of users in a variety of domains. Research results, including developed software, that were previously at this phased out web site:

http://agreementmaker.org/wiki/index.php/Visual_Analytics

are now available at:

http://www.cs.uic.edu/Cruz/VisualAnalytics

The project provides research experience to students and results from this research will be included in the computer science curriculum.

NSF Award Web Page

http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1143926

Publications (Year 1)

[1] I. F. Cruz, A. Fabiani, F. Caimi, C. Stroe, and M. Palmonari. Automatic Configuration Selection Using Ontology Matching Task Profiling. In The Semantic Web: Research and Applications, Extended SemanticWeb Conference (ESWC), volume 7295 of Lecture Notes in Computer Science, pages 179-194, Springer, 2012. http://www.springerlink.com/content/m11307248823h737/?MUD=MP

[2] I. F. Cruz, M. Palmonari, F. Caimi, and C. Stroe. Building Linked Ontologies with High Precision Using Subclass Mapping Discovery. Artificial Intelligence Review, 2012 (to appear, see publications of Year 2).

[3] I. F. Cruz, C. Stroe, F. Caimi, A. Fabiani, C. Pesquita, F. M. Couto, and M. Palmonari. Using AgreementMaker to Align Ontologies for OAEI 2011. In ISWC International Workshop on Ontology Matching(OM), volume 814 of CEUR Workshop Proceedings, pages 114-121, 2011. http://ceur-ws.org/Vol-814/oaei11_paper1.pdf

[4] I. F. Cruz, C. Stroe, and M. Palmonari. Interactive User Feedback in Ontology Matching Using Signature Vectors. In IEEE International Conference on Data Engineering (ICDE), pages 1321-1324. IEEE, 2012. http://www.computer.org/portal/web/csdl/doi/10.1109/ICDE.2012.1

Publications (Year 2)

[5] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. M. Couto. Testing the AgreementMaker System in the Anatomy Task of OAEI 2012. CoRR abs/1212.1625 (2012). http://arxiv.org/abs/1212.1625

[6] D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, F. M. Couto. The AgreementMakerLight Ontology Matching System. In: ODBASE 2013 (to appear, see publications of Year 3).

[7] I. F. Cruz, M. Palmonari, F. Caimi, and C. Stroe. Building Linked Ontologies with High Precision Using Subclass Mapping Discovery. Artificial Intelligence Review, 40(2): 127-145 (2013). http://link.springer.com/article/10.1007%2Fs10462-012-9363-x

[8] C. Pesquita, D. Faria, C. Stroe, E. Santos, I. F. Cruz, and F. M. Couto: What's in a `nym' ? Synonyms in Biomedical Ontology Matching. ISWC 2013 (to appear, see publications of Year 3).

Publications (Year 3)

[9] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. Couto, Automatic Background Knowledge Selection for Matching Biomedical Ontologies, PLOS ONE (to appear, see [22] in Year 4).

[10] C. Pesquita, D. Faria, C. Stroe, E. Santos, I. F. Cruz, F. M. Couto, What's in a `nym' ? Synonyms in Biomedical Ontology Matching, International Semantic Web Conference (ISWC), pp. 526-541, 2013. http://link.springer.com/chapter/10.1007%2F978-3-642-41335-3_33

[11] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. M. Couto, AgreementMakerLight results for OAEI 2013, ISWC Ontology Matching Workshop (OM), CEUR-WS, vol. 1111, pp. 101-108, 2013. http://ceur-ws.org/Vol-1111/oaei13_paper1.pdf

[12] D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, F. M. Couto, The AgreementMakerLight Ontology Matching System, International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), pp. 527-541, 2013. http://link.springer.com/chapter/10.1007%2F978-3-642-41030-7_38

[13] I. F. Cruz, V. R. Ganesh, C. Caletti, P. Reddy, GIVA: A Semantic Framework for Geospatial and Temporal Data Integration, Visualization, and Analytics, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS), pp. 534-537, 2013. (Best Demo Paper Award). http://dl.acm.org/citation.cfm?id=2525314.2525324

[14] I. F. Cruz, V. R. Ganesh, S. I. Mirrezaei, Semantic Extraction of Geographic Data from Web Tables for Big Data Integration, ACM SIGSPATIAL Workshop on Geographic Information Retrieval (GIR 13), pp. 19-26, 2013. http://dl.acm.org/citation.cfm?id=2533939

[15] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. M. Couto, AgreementMakerLight: A Scalable Automated Ontology Matching System, International Conference on Data Integration in the Life Sciences (Demonstration). http://dmir.inesc-id.pt/dils2014/wp-content/uploads/2014/06/paper_34.pdf

[16] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. M. Couto, AgreementMakerLight 2.0: Towards Efficient Large-Scale Ontology Matching, International Semantic Web Conference (Posters & Demos) (to appear, see [24] in Year 4).

[17] I. F. Cruz, F. Loprete, M. Palmonari, C. Stroe, A. Taheri, Pay-As-You-Go Multi-User Feedback Model for Ontology Matching (July 2014, under review, see [21] in Year 4).

Publications (Year 4)

[18] Y. Li, C. Stroe, and I. F. Cruz, Interactive Visualization of Large Ontology Matching Results, International Workshop on Visualizations and User Interfaces for Ontologies and Linked Data (VOILA) in association with the 14th International Semantic Web Conference (ISWC), 2015 (to appear, see publications in Year 5).

[19] J. Aurisano, A. Nanavaty, and I. F. Cruz, AlignmentVis: Visual Analytics for Ontology Matching, IEEE InfoVis Poster Program, 2015 (to appear, see publications in Year 5).

[20] J. Aurisano, A. Nanavaty, and I. F. Cruz, Visual Analytics for Ontology Matching Using Multi-Linked Views, International Workshop on Visualizations and User Interfaces for Ontologies and Linked Data (VOILA) in association with the 14th International Semantic Web Conference (ISWC), 2015 (to appear, see publications in Year 5).

[21] I. F. Cruz, F. Loprete, M. Palmonari, C. Stroe, and A. Taheri, Pay-As-You-Go Multi-user Feedback Model for Ontology Matching, 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW), pp. 80-96, 2014. http://link.springer.com/chapter/10.1007%2F978-3-319-13704-9_7

[22] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, and F. Couto, Automatic Background Knowledge Selection for Matching Biomedical Ontologies, PLoS ONE 9(11): e111226. doi:10.1371/journal.pone.0111226, November 2014. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111226

[23] D. Faria, C. Martins, A. Nanavaty, A. Taheri, C. Pesquita, E. Santos, I. F. Cruz, and F. M. Couto, AgreementMakerLight Results for OAEI 2014, 9th International Workshop on Ontology Matching collocated with the 13th International Semantic Web Conference (OM), pp. 105-112, 2014. http://disi.unitn.it/~p2p/OM-2014/oaei14_paper1.pdf

[24] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. M. Couto, AgreementMakerLight 2.0: Towards Efficient Large-Scale Ontology Matching, International Semantic Web Conference (Posters & Demos). http://ceur-ws.org/Vol-1272/paper_151.pdf

[25] I. F. Cruz, M. Palmonari, F. Loprete, C. Stroe, and A. Taheri, Quality-Based Model for Effective and Robust Multi-User Pay-As-You-Go Ontology Matching, Semantic Web Journal (to appear, see publications in Year 5).

Publications (Year 5)

[26] Y. Li, C. Stroe, and I. F. Cruz, Interactive Visualization of Large Ontology Matching Results, International Workshop on Visualizations and User Interfaces for Ontologies and Linked Data (VOILA) in association with the 14th International Semantic Web Conference (ISWC), 2015. http://ceur-ws.org/Vol-1456/paper4.pdf

[27] J. Aurisano, A. Nanavaty, and I. F. Cruz, AlignmentVis: Visual Analytics for Ontology Matching, IEEE InfoVis Poster Program, 2015. Vis preview video.

[28] J. Aurisano, A. Nanavaty, and I. F. Cruz, Visual Analytics for Ontology Matching Using Multi-Linked Views, International Workshop on Visualizations and User Interfaces for Ontologies and Linked Data (VOILA) in association with the 14th International Semantic Web Conference (ISWC), 2015. http://ceur-ws.org/Vol-1456/paper3.pdf

[29] D. Faria, C. Martins, A. Nanavaty, D. Oliveira, B. S. Balasubramani, A. Taheri, C. Pesquita, F. M. Couto, and I. F. Cruz AgreementMakerLight Results for OAEI 2015, 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (OM), pp. 116-123, 2015. http://ceur-ws.org/Vol-1545/oaei15_paper1.pdf

[30] I. F. Cruz, M. Palmonari, F. Loprete, C. Stroe, and A. Taheri, Quality-Based Model for Effective and Robust Multi-User Pay-As-You-Go Ontology Matching, Semantic Web Journal 7(4), pp. 463-479, 2016. http://www.semantic-web-journal.net/system/files/swj893.pdf

[31] B. S. Balasubramani, A. Taheri, and I. F. Cruz (2015). User involvement in ontology matching using an online active learning approach. 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (ISWC 2015), CEUR Workshop Proceedings, vol. 1545, pp. 45-49. http://ceur-ws.org/Vol-1545/om2015_TSpaper3.pdf

[32] S. I. Mirrezaei, B. Martins, and I. F. Cruz (2015). The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives. The Semantic Web: ESWC 2015 Satellite Events, Revised Selected Papers, LNCS vol. 9341, F. Gandon, C. Gueret, S. Villata, J. G. Breslin, C. Faron-Zucker, and A. Zimmermann, pp. 230-243, Springer, 2015. http://dx.doi.org/10.1007/978-3-319-25639-9_39

Summary of Results (Year 1)

Our collaboration with domain experts in the geospatial domain has revealed that they value highly automatic matching methods especially for ontologies with thousands of concepts. However, they want to be able to follow closely the matching process, thus requiring to be directly involved in the loop and to evaluate its efficacy. To achieve this kind of interaction, semi-automatic methods, which make use of the most sophisticated automatic methods, are needed.

For this grant, we have been especially looking at semi-automatic methods, while also pursuing research to improve the state of the art of automatic methods. Our work has been performed within our award-winning AgreementMaker ontology matching system, which has dominated the OAEI Anatomy Track again in 2011 [3] and which has been extended successfully to Linking Open Data [2]. To further improve our automatic methods, we investigated intelligent combination of multiple matching results, using machine learning techniques [1].

During the first year of the grant, we made the following progress along the following three key research challenges [4]:

(1) Visualization: The visual analytics panel is integrated in the user interface of AgreementMaker. Its visualization and control functionality assists users at each iteration of the feedback loop. A plot that represents the similarity matrix for each matching algorithm is presented, so as to give an overview of the distribution of the similarity values in the space of possible mappings. The reference alignment or gold standard that is being progressively built is overlaid on each matcher's matrix plot for comparison; when users select a particular mapping, it is emphasized in every matrix plot. We created the concept of '''signature vector''', which contains the results obtained by several algorithms when mapping two concepts (the dimension of the vector being the number of such algorithms). In this way, a comparative analysis of the corresponding signature vector is possible. Upon selection of a mapping, its cluster can also be visualized in each matrix plot (see Figure 1). The visual analytics panel brings a whole new light to the matching process allowing users to discover matching patterns that were previously hidden in the complexity of the process. In particular, this panel greatly helped us in pinpointing correct and missed mappings in the course of our own experiments.

(2) Architecture: The main focus of this grant is a new approach to incorporate user feedback in the ontology mapping process [4]. Our approach clusters mappings to identify where user feedback will be most beneficial in reducing both the number of user interactions and of system iterations. This feedback process has been implemented in the AgreementMaker system and is supported by visual analytic techniques that help users to better understand the matching process.

We address the following questions:

Q1: Which specific candidate mappings should be presented to the user for validation?, and

Q2: How can the feedback provided by users be exploited to improve the existing alignment?

To answer the first question, we developed the '''candidate selection component''' that is responsible for the selection of the mappings that are presented to the user. The mappings that a user has validated in previous iterations are filtered out from the candidate mapping selection, so that the same mapping will not be validated twice. To answer the second question, we developed the '''signature-based mapping clustering component''', for which the following is needed: a user-validated mapping and its corresponding signature vector. Then our approach identifies similar mappings by adopting a double threshold to create a cluster associated with that vector. Each cluster also has a similarity threshold that defines which mappings belong to the cluster.

(3) Performance evaluation: We obtained an average F-Measure gain of 7.2% as a result of the feedback propagation method. This is a sizable gain considering that we started from an already high average F-Measure of 80.6%, which was obtained using our high-quality automatic matching methods, and that it was realized with 50 iterations, which represent only 1.26% of the mapping search space.

Summary of Results (Year 2)

Major Activities: The premise of our research work is that visualization can empower users to help an ontology matching system achieve better results (in an iterative fashion). Our work has been performed using AgreementMaker, having in mind that our results will apply to any advanced ontology matching system (namely a system that combines several matching methods).

In Year 2, we looked at three ways of extending our results of Year 1. In the first one we consider visualization to aid user feedback, in the second one scalability of the architecture, and in the third one we consider performance, specifically in the biomedical domain. Biomedical domain considerations can lead to broader impact of our work.

Specific Objectives:

(1) Visualization: We considered how to present visual information to users, so as to better inform user feedback in the validation process.

(2) Scalability of the architecture: We want to validate our previous results for larger ontologies, which pose interesting challenges.

(3) Performance (Biomedical domain): We have investigated how different approaches can impact the performance of ontology matching techniques in the biomedical domain. The biomedical domain offers interesting challenges and possible rewards in terms of broader impact. AgreementMaker and AgreementMakerLight also offer a platform for domain experts to experiment with ontology matching systems and to test our visualization techniques to improve ontology matching.

Significant Results:

(1) Visualization: The rationale for seeking a new kind of visualization, stems from our previously developed visual analytics panel where each cell corresponds to a similarity value for a pair of concepts (one in the source and the other one in the target ontology). In a matching system like AgreementMaker, which has numerous matching methods, the value in each cell in the final matrix is the result of the combination of the values for several matching methods. The combination of values depends on several factors, including the configuration of the overall ontology matching system, which is performed automatically using a machine learning method [1], and therefore not necessarily known a priori by the users. By providing a visual representation of the "provenance" of the matrix values, which we visualize as a graph, it is possible to determine how each value is generated. Specifically, given a pair of concepts, a graph shows how the similarity value was obtained starting from the similarity values of the different methods that intervened. This visualization thus provides a "drill down" view, which offers a complementary view to the analytic panel view.

(2) Scalability of the architecture: Ontology matching methods need to perform within acceptable runtimes, without deterioration of other quality metrics, such as F-measure (scalability of the architecture). A second challenge is that visualizations that are appropriate for a couple of hundred concepts might not be appropriate for thousands of concepts (scalability of the visualization). During the past year, we looked at the first aspect, that of a new architecture that can accept as input very large ontologies efficiently. Because of its efficiency it can deal with "on the go" ontology matching tasks, such as those that stem from applications involving linked open data, for example where parsing of incoming text may require concept matching with background knowledge [7].

To address scalability, we have developed a new core framework, AgreementMakerLight, focused on computational efficiency and designed to handle very large ontologies, while preserving most of the flexibility and extensibility of the original AgreementMaker framework. We evaluated the efficiency of AgreementMakerLight in two OAEI (Ontology Alignment Evaluation Initiative) tracks: Anatomy and Large Biomedical Ontologies, obtaining excellent run time results. In addition, for the Anatomy track, AgreementMakerLight is now the best system as measured in terms of F-measure. Also in terms of F-measure, AgreementMakerLight is competitive with the best OAEI performers in two of the three tasks of the Large Biomedical Ontologies track that match whole ontologies [5,6].

(3) Performance (Biomedical domain): The successful application of ontology matching techniques is strongly tied to an effective exploration of the complex and diverse biomedical terminology contained in biomedical ontologies. We have determined the lexical components of several biomedical ontologies and have investigated how different approaches can impact the performance of ontology matching techniques. We proposed novel approaches to explore the different types of synonyms encoded by the ontologies and also considered internal synonym derivation and external ontologies. We evaluate these approaches using AgreementMaker, which implements several lexical matchers, and applied them to a set of four benchmark biomedical ontology matching tasks. Our results demonstrate the impact that an adequate consideration of ontology synonyms can have on matching performance. We have validated our novel approach for combining internal and external synonym sources as a competitive and in many cases improved solution for biomedical ontology matching [8].

Summary of Results (Year 3)

Major Activities: In the course of this grant we are doing work that can serve as the foundations for a new generation of ontology matching systems where automatic methods and human-based computation (possibly from multiple users) can interact to achieve better results in terms of quality of the obtained alignments and of the performance (measured in numbers of iterative steps and/or computation time). Our work has been performed using AgreementMaker and AgreementMakerLight, but our results will apply to any advanced ontology matching system (namely a system that combines several matching methods). As we progress in our work, we are encountering several major challenges related to the scalability of the visualization and of the matching process, to the architecture of the two steps (visualization and matching), and to determining the benefits of combining automatic ontology matching with human-based computation.

Specific Objectives: In Year 3, we looked at three ways of extending our results of Years 1 and 2. In the first one we considered the visualization of very large ontologies, in the second one we considered the analytics component of the overall architecture, the matching of linked open data and of instances, and the scalability to very large ontologies, and in the third one we considered the performance of the three architectural aspects.

Significant Results:

(1) Visualization: We have investigated techniques for the visualization of large ontologies using interactive pie charts for the traversal of ontologies where the visualization of the subtrees shows their matching, recursively. At each step, users can assign confidence levels to mappings between concepts that are the roots of the subtrees. Such levels are used to modify the automatically determined mapping values. This visualization scales well to very large ontologies. Current limitations include the inability to visualize (due to screen size limitations) the results of several matching algorithms at the same time. Future work includes extending this approach to multiple algorithms and the determination of the benefits brought by the interactive visualization method in terms of precision and recall of the matching task.

(2) Architecture: We have been working on the analytics of a multi-user "pay as you go" ontology matching process, where users may be incorrect when asked to validate mappings found by automatic ontology matching algorithms, yet the feedback provided by all the users will lead to an alignment whose F-measure (when compared with the "gold standard") is higher than the one obtained by the automatic matching algorithms without user validation. There are two main components to this process: (1) the selection of which mappings to be presented to the users and (2) the automatic propagation of the users' feedback to other "similar" mappings. We note that whereas the visual component is not part yet of this analytics study, there are several fundamental issues being studied, including the determination of the actual mappings that will be presented to the users (the space of possible mappings being very large), their propagation, and the difficulties that arise when humans make errors in validating mappings. We have also considered the matching of linked open data and of instance matching. To address scalability, we have developed a new core framework, AgreementMakerLight, focused on computational efficiency and designed to handle very large ontologies, while preserving most of the flexibility and extensibility of the original AgreementMaker framework.

(3) Performance: In the multi-user "pay as you go" analytics framework, our extensive results show how F-measure and robustness vary as a function of the number of user validations. We consider different user error and revalidation rates (the latter measures the number of times that the same mapping is validated).

Our results highlight complex trade-offs and point to the benefits of dynamically adjusting the revalidation rate. Experiments show that when compared with a leading linked open data approach, AgreementMaker achieves considerably higher precision and F-measure, at the cost of a slight decrease in recall. As for the scalability of our approach, we have obtained top results (together with the University of Lisbon) in the OAEI 2013 competition using the AgreementMakerLight system: first place in the Anatomy track (best results ever), second place in the Large BioMed track, second place in the Conference track.

Key Outcomes or Other Achievements:

1. Dissemination: We have published our results in different communities: the Database/Semantics Community (ODBASE), the Semantic Web Community (ISWC, ESWC), the Life Sciences Community (International Conference on Data Integration in the Life Sciences), the AI community (Artificial Intelligence Review), the GIS community (ACM SIGSPATIAL GIS and GIR). Through our collaborators at the University of Lisbon we have received feedback on the AgreementMakerLight Ontology Matching system from biomedical experts.

2. Award: Our GIVA semantic framework for Geospatial and Temporal Data Integration, Visualization, and Analytics has won the best demo paper award at the 2014 ACM SIGSPATIAL GIS Conference.

3. Involvement of students: The following students have been involved in this project: Aynaz Taheri, Amruta Nanavaty, Yiting Li, Cosmin Stroe, Venkat Raghavan Ganesh, and Iman Mirrezaei (graduate), Anna Anderson, Tais Bellini, and Devina Dhawan (undergraduate), and Francesco Loprete (visiting graduate student from U. Milano-Bicocca). Six women have been involved in this project: Aynaz Taheri, Amruta Nanavaty, Yiting Li, Anna Anderson, Tais Bellini, and Devina Dhawan.

4. Teaching: Related topics have been included in the teaching of a graduate course on "Data and Web Semantics", which was attended by more than 20 students.

Summary of Results (Year 4)

Major Activities: We developed significant activities along the three key challenges of our research. In visualization, we continued the design and implementation of the techniques for the visualization of large ontologies that we started in the previous year; we conceived a completely new interface with focus on new types of interaction for tasks that are required by ontology matching experts. In architecture, our activity was centered on the integration of the ontology matching process with the visualization process and on designing key components that optimize the automatic process of ontology matching and the interaction with the users. In performance, we conducted extensive experimentation to demonstrate the effectiveness of our human-supported computation methods and took part in the OAEI competition both with automatic and semi-automatic methods.

Specific Objectives: In Year 4, we looked at three ways of extending our results of the previous years. In the first one we considered: (1) a scalable method to visualize very large ontologies by concluding the design and implementation that was started in Year 3; (2) designed and implemented an interface whose main objective is to support complex analytic tasks visually; in the second one, we designed several essential components that: (1) optimize the interactive cycle (including generating ranked mappings to be presented for validation to the users); (2) automate background knowledge source selection; in the third one we concentrated on measuring the robustness and effectiveness associated with the interactive cycle of user validation and on improving the performance of the AgreementMakerLight ontology matching system, extending it to incorporate user feedback, among other novel features.

Significant Results:

(1) Visualization: We have worked along two main directions: In the first one, we explore the use of pie charts to build an interactive interface for the matching process [18]. Pie charts naturally scale to any ontology size without increasing their area. We augment this visualization with several navigation possibilities such as the ability to traverse the ontologies vertically (children of a class) and horizontally (siblings of a class), so that both an overall view (that of the pie chart) and a detailed view can coexist. The interface supports the comparison of the results of different matching algorithms. Along the second direction, we have built an interactive interface where we explore the power of multi-linked views [19]. The multi-linked views were designed so as to support the evaluation, diagnosis, comparison, and exploration tasks performed by expert users. Trends and patterns are highlighted by the links across the various views. For example, the evaluation of the performance of each matcher makes use of exploration and comparison tasks supported by this visual interface. Views of entity details, through meaningfully designed explorative interactions and through comparative views of the results across different matchers, have helped to identify potential sources of error [20].

(2) Architecture: We have considered the overall integration of an interactive interface that supports very large visualizations with the process of user validation and designed the workflow of the interactive process [18], which includes the following main components:

(a) the component that ranks the mappings that are produced automatically in terms of their quality using a model based on multiple quality measures, so that users validate first the mappings with the lowest quality [21], the expectation being that they are the more likely to be incorrect. Our approach is based on active learning, in that we are trying to identify those mappings whose correction will lead to better results faster. Our hypothesis is that those mappings are exactly those with lowest quality. First we describe: (i) a couple of those measures, then (ii) the overall strategy including the meta-strategy (the performance evaluation is reported under (3), below):

(i) Among the five measures we consider is the disagreement among different matching algorithms. The higher it is for a mapping, the lower is the mapping's quality. Another quality measure is the feedback stability, which measures the likelihood that a label associated with a mapping changes (e.g., from correct to incorrect) as the result of a validation input by a new user, where decisions are made by a majority vote strategy. In this case, lower quality is associated with higher instability;

(ii) The five measures are used to define two different candidate selection strategies: one that selects from those mappings that were not yet validated and the other one that revalidates, that is, selects from those mappings that were validated but for which no consensus was reached yet. These strategies are combined by one meta-strategy that combines the two selection strategies, with the possibility of favoring one versus the other.

(b) The use of background knowledge for ontology matching is often a key factor for success, particularly in complex and lexically rich domains such as the life sciences. We developed a novel methodology for automatically selecting background knowledge sources for any given ontologies, which measures the usefulness of each background knowledge source by assessing the fraction of classes mapped through it over those mapped directly, which we call the mapping gain [25];

(c) Regarding the algorithms that automatically perform ontology matching, we broadened the scope of AgreementMakerLight by creating new matchers for translation and structural matching and to accept user feedback, while reinforcing key aspects (including the use of background knowledge and alignment repair) [23, 24].

(3) Performance Evaluation:

(a) We are trying to identify those mappings whose correction will lead to better results faster. Our hypothesis is that those mappings are exactly those with lowest quality. In our extensive evaluation, we attempted to validate this hypothesis, by performing a comparison between several of our own strategies (and of their combination using our meta-strategy) and those of others. We performed the comparison in terms of the number of false positives, false negatives, F-measure gain (at iteration 100), and the Normalized Discounted Cumulative Gain (NDCG) of the ranked list (a common strategy to evaluate the quality of ranked lists). Our results indicate that of all of our strategies, the one that puts emphasis on letting users validate earlier the mappings that are more likely to be misclassified based on the automatic matching methods (rather than those already validated by other users) is the best strategy. In addition, we significantly outperform the results of a competing approach [25].

(b) We implemented this background knowledge selection methodology in the AgreementMakerLight ontology matching framework, and evaluated it using the benchmark biomedical ontology matching tasks from the Ontology Alignment Evaluation Initiative (OAEI). In each matching problem, our methodology consistently identified the sources of background knowledge that led to the highest improvements over the baseline alignment (i.e., without background knowledge). Furthermore, our proposed mapping gain parameter is strongly correlated with the F-measure of the produced alignments [22];

(c) The participation of AgreementMakerLight in the 2014 OAEI was very successful, as it obtained the highest F-measure in 6 of the 8 ontology matching tracks [23].

Key Outcomes or Other Achievements:

1. Dissemination: We have published our results in a wide range of communities: the Database/Semantics Community (ODBASE), the Semantic Web Community (ISWC, ESWC, EKAW), the Life Sciences Community (International Conference on Data Integration in the Life Sciences), the AI community (Artificial Intelligence Review), the GIS community (ACM SIGSPATIAL GIS and GIR), the Information Visualization community (InfoVis). Through our collaborators at the University of Lisbon we have received feedback on the AgreementMakerLight Ontology Matching system from biomedical experts and we published a journal article at PLoS ONE to reach a multidisciplinary community that includes biomedical experts.

2. Award:

In the OAEI 2014 competition, AgreementMakerLight was ranked first in the Anatomy track, first in the Conference track, first in the Large BioMed track, first in the MultiFarm track, first in the Library track, and first in the Interactive Matching Evaluation track. Thus, overall, AgreementMakerLight was the best ontology matching system at OAEI 2014. We note that AgreementMaker or AgreementMakerLight have been top performers at OAEI since 2009, in a field that has become more competitive every year.

3. Involvement of students: The following students have been involved in this project: Amruta Nanavaty, Yiting Li, Cosmin Stroe, Venkat Raghavan Ganesh, Iman Mirrezaei, Aynaz Taheri, Jill Aurisano, and Booma Balasubramani (graduate), Anna Anderson, Tais Bellini, Devina Dhawan, and Manasa Bandapalle (undergraduate), and Francesco Loprete (visiting graduate student from U. Milano-Bicocca). Eight women have been involved in this project: Nanavaty, Li, Balasubramani, Aurisano, Taheri, Bellini, Dhawan, and Bandapalle.

4. Teaching: Related topics have been included in the teaching of a graduate course on "Data and Web Semantics", which has been taught yearly.

5. Inventions: AgreementMaker Ontology Matching System. UIC Research Discovery 2016-002

6. Software: AgreementMaker Ontology Matching System, source code, shared on GitHub. Available at: https://github.com/agreementmaker/agreementmaker

The code has been released under the GNU Affero General Public License v3.

Summary of Results (Year 5)

Major Activities: In this final year of the grant, we concluded the research of the previous year and published several papers along the three key challenges of our research. Hence, the focus of our research coincides with the three main objectives of Year 4.

Visualization: We continued the design and implementation of the techniques for the visualization of large ontologies; we conceived a completely new interface with focus on new types of interaction for tasks that are required by ontology matching experts.

Architecture: Our focus was on the integration of the ontology matching process with the visualization process and on designing key components that optimize the automatic process of ontology matching and the interaction with the users.

Performance: We published results on the effectiveness of the analytical process that is integrated in the overall architecture. We took again part in the OAEI competition (2015) both with automatic and semi-automatic methods.

Specific Objectives:

Visualization: We developed a scalable method to visualize very large ontologies. We designed and implemented an interface whose main objective is to support complex analytic tasks visually.

Architecture: We designed several essential components that optimize the interactive cycle (including generating ranked mappings to be presented for validation to the users).

Performance: We measured the robustness and effectiveness associated with the interactive cycle of user validation. We further improved the performance of the AgreementMakerLight ontology matching system, which was again the winner in most tasks of OAEI 2015.

Significant Results:

(1) Visualization: We have worked along two main directions:

(1.1) We explore the use of pie charts to build an interactive interface for the matching process [26]. Pie charts naturally scale to any ontology size without increasing their area. We augment this visualization with several navigation possibilities such as the ability to traverse the ontologies vertically (children of a class) and horizontally (siblings of a class), so that both an overall view (that of the pie chart) and a detailed view can coexist. The interface supports the comparison of the results of different matching algorithms.

(1.2) We have built an interactive interface where we explore the power of multi-linked views [27]. The multi-linked views were designed so as to support the evaluation, diagnosis, comparison, and exploration tasks performed by expert users. Trends and patterns are highlighted by the links across the various views. For example, the evaluation of the performance of each matcher makes use of exploration and comparison tasks supported by this visual interface. Views of entity details, through meaningfully designed explorative interactions and through comparative views of the results across different matchers, have helped to identify potential sources of error [28].

The presentations of the two papers at Voila! [26, 28] and of the poster at Infovis [27] elicited interesting discussions pointing to new research directions.

(2) Architecture: Our focus has been along the design of the architecture of the semi-automatic or interactive ontology matching process and of several components of that architecture. Our work can be summarized along the following three different directions:

(2.1) Design of the workflow of the interactive process (see Figure 7 of [26]) that starts with the ranking of the mappings according to their quality and is followed by the presentation to the users of those mappings with lowest quality, the expectation being that they are the more likely to be incorrect. Users assign a level of confidence to those mappings, as informed by the provided visualizations. This process is iterative following a "user feedback loop".

(2.2) We have devised two different active learning approaches for the interactive process, in that we are trying to identify those mappings whose correction will lead to better results faster. In the first one, we use a model based on five quality measures [30]. In this approach, we allow for several users to collaborate in the matching process, using a majority vote strategy. The five measures are used to define two different candidate selection strategies: one that selects from those mappings that were not yet validated and the other one that revalidates, that is, selects from those mappings that were validated but for which no consensus was reached yet. These strategies are combined by one meta-strategy that combines the two selection strategies, with the possibility of favoring one versus the other. The second active learning approach uses six of the matchers of the AgreementMaker ontology matching system, including the Linear Weighted Combination (LWC) matcher, which performs a weighted combination of the results of the other five matchers, using weights that are automatically determined using a quality metric. We train a classifier and modify the weights of the LWC matcher using an iterative approach, following the on-line learning paradigm. The process continues until there is no significant improvement in F-Measure [31].

(2.3) We have developed the TRIPLEX extractor of triples (of the form <subject; relation; object>) using NLP techniques [32]. TRIPLEX complements the output of already available verb-mediated information extractors with noun-mediated triples. Extraction of triples contributes in the following two ways to this award: (i) ontologies may contain large text fragments but current ontology matching techniques do not use such text to perform mappings; (ii) the extraction of triples from text allows for the matching of that text (through its triples) to an ontology.

(3) Performance Evaluation:

We improved substantially the performance of several components AgreementMakerLight so as to face possible improvements of competing ontology matching systems in the OAEI competition, a goal that was fully achieved [29] (see also Awards).

2. Awards:

In the OAEI 2015 competition, AgreementMakerLight was ranked first in the Anatomy track, first in the Conference track, first in the Large BioMed track, first in the MultiFarm track, first in the Library track, and first in the Interactive Matching Evaluation track. Thus, overall, AgreementMakerLight was the best ontology matching system at OAEI 2015 [29]. We note that AgreementMaker first and more recently AgreementMakerLight have been top performers at OAEI since 2009, in a field that has become more competitive every year: in 2015 there were 22 participants vs. 14 participants in 2014.

The paper "The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives" was selected as one of the two best papers at Know@LOD and therefore invited to a Springer book with the best papers of the 2015 ESWC workshops [32].

3. Involvement of students: The following students have been involved in this project: Amruta Nanavaty, Yiting Li, Cosmin Stroe, Venkat Raghavan Ganesh, Iman Mirrezaei, Aynaz Taheri, Jill Aurisano, and Booma Balasubramani (graduate), Anna Anderson, Tais Bellini, Devina Dhawan, and Manasa Bandapalle (undergraduate), and Francesco Loprete (visiting graduate student from U. Milano-Bicocca). Of these students, eight are women: Nanavaty, Li, Balasubramani, Aurisano, Taheri, Bellini, Dhawan, and Bandapalle.

4. Teaching: Related topics have been included in the teaching of a graduate course on "Data and Web Semantics", which has been taught yearly, and end of the term projects included research on ontology matching and visualization. Students in the new senior/graduate course "Introduction to Data Science" (taught for the first time at UIC in Spring 2016) were taught principles of data integration, ontology matching. Students were also introduced to the user interfaces for ontology matching that support visual analytics developed under this project.

-- Main.ifcruz - 2016-12-18

Edit | Attach | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r12 - 2017-01-23 - 06:43:20 - Main.ifcruz
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF