|
Mark’s research focuses on increasing programmers' productivity by automating various activities at different stages of the development lifecycle. In his research, Mark utilizes various techniques from software engineering, language design, program analysis, security, and machine learning to address specific issues that affect programmers when they design, write, debug, and test software. Mark’s research program is supported by NSF grants where he is a principal investigator and multiple industry partners who sponsor his research by investing into his ideas and providing platforms and applications to empirically validate his research prototypes.
Overview
Mark’s guiding research principle is to select research problems that lie at the intersection of academic challenge and industry importance. To find these problems, Mark utilizes his long-term industry experience and ties in combination with his strong academic background. Mark looks for patterns of difficulties that software engineers experience in their everyday tasks. The next steps are to understand these patterns, create models of real-world situations, and to decompose these models into fundamental components and their interactions that constitute these patterns. Once a problem is formulated, Mark begins a search for solutions that often involve multiple techniques from different areas of computer science.
The main premise of Mark’s research is that different problems and their solutions affect one another despite a popular view that these problems often belong to separate concerns. For example, data privacy and software testing problems were viewed as separate concerns and studied separately, until he showed in his award-winning ISSRE’10 paper that these concerns affect each other leading to serious problems that were confirmed by his industry collaborators.
Currently, Mark focuses his research on three main areas: to maximize the utility of different software engineering tasks in the presence of data privacy constraints, to increase the effectiveness of software maintenance, evolution, and reuse, and to improve software testing approaches. His research prototype are evaluated and deployed at different Fortune 500 companies, showing that rapid technology transfer is possible when research ideas are aligned with industry needs.
Currently, Mark is working with his colleagues and collaborators on the following projects.
Software Engineering in the Age of Data Privacy
Creating and maintaining software is beset by many challenges, which include protecting sensitive information. Not only do recent data protection laws and regulations around the world prohibit organizations from disclosing confidential data, but they also impose stiff consequences for these organizations should they accidentally release sensitive information in software artifacts. In the past decade, there have been many publicized cases of leaked source code that contained sensitive information from well-known companies. Clearly, sensitive information should be redacted in source code and other software artifacts; however, doing this manually is difficult and time-consuming.
More importantly, blindly removing sensitive information from software artifacts may severely reduce program comprehension, thereby thwarting different software maintenance and evolution tasks. Finding a solution that balances the goals of privacy and utility, for example, program comprehension in the context of software maintenance tasks is one of the modern challenges of global software development theory and practice. The long-term goal of my research program is to provide effective solutions to redact sensitive information while maximizing the utility of different software engineering tasks. This research is supported by NSF Grant CCF-1017633 and different industry collaborators.
Software Maintenance, Evolution, and Reuse
A goal of my research is to explore an untapped potential of integrating textual and structured information in diverse software artifacts and produce transformative models to automatically achieve better traceability among software artifacts and to search, select, and synthesize relevant fragments of source code in different applications. As part of my research in this area, I produced a number of traceability approaches and code search engines that are publically available: Exemplar, Portfolio, JavaCLAN, and LeanArt. This research is supported by NSF support CCF-0916139 and different industry collaborators. Examples of my work in this area include the problem of linking traceability of software requirements and acceptance tests and the S3 approach.
Bridging the Abstraction Gap between Requirements and Acceptance Tests
Two distinct milestones of any software development lifecycle are requirements gathering and acceptance testing, where a software product is validated against its requirements. Yet this validation is one of the most difficult tasks, since it involves bridging an abstraction gap between high-level descriptions of requirements and their low-level implementations in source code. Unfortunately, linking acceptance tests to requirements is an exceedingly difficult, manual, laborious and time-consuming task.
Searching, Selecting, and Synthesizing (S3) Source Code
When programmers develop, maintain, and evolve software to satisfy new requirements, they intuitively sense that there are existing fragments of code that are relevant to these requirements and that were implemented by other developers. These code fragments could be reused if found; however, three main problems inhibit effective software reuse: rudimentary and unsophisticated source code search engines, the lack of support for selecting retrieved code snippets from relevant applications, and the abstraction gap between low level implementations of these code fragments and pertinent high level requirements that are given to developers. Moreover, source code repositories are polluted with poorly functioning projects with ambiguous, inconsistent, incomplete, or even absent documentation. The result is overwhelming complexity, a steep learning curve, and a significant cost of building customized software.
Software Testing
A goal of my research is to increase the effectiveness of software tests, so that these tests will expose more bugs in a shorter period of testing time, thus saving resources and improving the quality of software. In my research, I formulate and attack multiple variations of this problem that address different aspects of software testing. For example, my recent work is in performance testing, where I developed an approach to quickly find situations where applications unexpectedly exhibit worsened characteristics for certain combinations of input values. A fundamental question that I addressed in this work is how to select a manageable subset of the input data faster to find performance problems in applications automatically. By applying my solution to a large-scale application at a major insurance company, performance problems were found automatically and confirmed by experienced testers and developers.
When directing a research group at Accenture, I created an approach for maintaining and evolving testing scripts that is estimated to save over $60 Mil annually for Accenture, and papers evaluating this approach were published at ACM/IEEE International Conference on Software Engineering (ICSE’09) and won the Best Paper Award at the International Conference on Software Testing, Verification, and Validation (ICST’09).
Automating SynthesiS of Integration Software Tests (ASSIST)
The larger the project, the more important is integration testing, in which integrated software modules or components are evaluated as a whole to determine if they behave correctly. Creating effective integration tests requires time, effort, and the knowledge of how different components interact in applications, and these are requirements that are hard to satisfy in an fast-pacing development environment. Thus, there is a dilemma: how can programmers create effective integration tests without affecting strict schedules for development?
We offer a novel solution for Automating SynthesiS of Integration Software Tests (ASSIST) using acceptance and unit tests. ASSIST combines runtime monitoring of the application that is executed using acceptance tests, multidimensional sequential pattern mining of method calls, and runtime state carving to obtain input values as well as oracles for synthesized integration tests.
MOdulArizing Test Scrips (MOATS)
A primary way to automatically test GUI-based APplications (GAPs) is by running test scripts that interact with GAPs by performing actions on their GUI objects. An extra effort that test engineers put in writing test scripts is paid off when these scripts are reused repeatedly on different GAPs. Unfortunately, test scripts depend heavily on the structures of GUIs, making it difficult for test engineers to reuse these scripts on different GAPs, thereby obliterating benefits of test automation.
We offer novel programming abstractions with which testers can specify what to test instead of how to test GAPs. We implement these abstractions as annotations to Java language and its runtime, prove the soundness of the type system, and give our type checking algorithm. The results of our evaluation show that our solution is effective in practice in helping test engineers to produce modular test logic and reason about it, and prevent and eliminate errors in these scripts.
pReventing databasE Deadlocks from Application-Centered Transactions (REDACT)
REDACT addresses the fundamental problem of predicting and preventing database deadlocks in database-centric applications. Our approach unifies disparate applications that use the same databases in a novel and promising way: their asynchronously issued transactions become resource sharing requests, and these requests can be identified and acted upon before they are executed by the database. This integration of databases and applications that use them constitutes the core of the proposed scalable database deadlock resolution algorithm.

|