Ph.D. Thesis Dissertation

Improving Web Security by Automated Extraction of Web Application Intent [ PDF ]


Need Over the past decade, the Web has been transformed from a collection of static HTML pages to a complex, distributed computing platform, as evidenced by the success of sites such as Facebook and YouTube. This transformation has been enabled primarily by web applications. The goal of this thesis is to investigate fundamental ways to improve the security of existing (legacy) web applications. To do so, we pursue research efforts in two complementary directions: a) techniques to uncover security flaws and b) techniques to automatically fix security flaws.

Challenges Finding and fixing security flaws in a legacy web application typically requires detailed knowledge of its behavior. This knowledge is a result of understanding high-level design artifacts combined with an analysis of the source code of the web application. However, it is well known that manual effort spent towards analysis of the source code is labor and cost-intensive and is often error-prone. Additionally, design level artifacts are often unavailable for legacy web applications and the only available resource is the source code. While source code is the most accurate description of the behavior of a web application, this description is expressed in low-level program statements. Due to its inherent low-level nature, source code does not readily offer a high-level understanding of an application's intended behavior which is necessary to identify and fix security flaws.

Extraction of Specifications and Possible Usages This thesis develops techniques to compute the high-level intended behavior of a legacy web application directly from its low-level source code description. The philosophy of discovering intent in order to detect vulnerabilities and prevent attacks rests on two simple observations: (a) web applications are written implicitly assuming benign inputs, and encode programmer intentions to achieve a certain behavior on these inputs, and (b) maliciously crafted inputs subvert the program into straying away from intended behaviors, leading to successful attacks. Leveraging on these observations we develop techniques for inferring intentions in the realms of uncovering security flaws as well as fixing them. Through two practical results, we demonstrate that this philosophy of inferring intent is a powerful one, and is broadly applicable to addressing challenges in web application security.

Result#1: Detecting Vulnerabilities The first result in this thesis presents a systematic approach for detection of parameter tampering vulnerabilities. These vulnerabilities arise in form processing code when the server-side fails to re-validate inputs that were rejected by the corresponding client-side validation. To detect vulnerabilities, our approach systematically explores the space of inputs that violate intended restrictions to find those that the server-side code fails to enforce. Evaluation of several open source and commercial web applications reveals serious security problems such as unauthorized monetary transactions at a bank and unauthorized discounts added in a shopping session. These results provide a strong evidence that extracting and checking intended behaviors, offers an effective mechanism for reasoning about vulnerabilities in web applications.

Result#2: Preventing Vulnerabilities The second result in this thesis offers a sound approach to prevent SQL injection vulnerabilities. These vulnerabilities arise when an application fails to restrict the influence of untrusted inputs on SQL queries. This approach first extracts web application intended SQL queries by analyzing its source code. Our strategy for fixing vulnerable web applications involves rewriting the source code to employ PREPARE statements, one of the well known robust defenses for SQL injection attacks. Experimental evaluation demonstrates effectiveness and scalability of our approach by successfully transforming large open source applications. Our approach presents a robust solution to the long standing problem of incorporating PREPARE statements in legacy web applications.

Conclusion The philosophy of extracting and using intentions offers a systematic and scalable way to combat security problems in legacy web applications. By presenting extensive results on both detection and prevention fronts, this thesis offers convincing evidence that reasoning of application intent enables development of principled approaches for improving security of web applications.