STEPDAD Project on Testing Database-Centric Applications For Handling Database Deadlocks

Mark Grechanik Ph.D., University of Texas at Austin

Summary Many organizations deploy applications that use databases by sending \emph{Structured Query Language (SQL)} statements to them and obtaining data that result from executions of these statements. Since applications often share the same databases concurrently, database deadlocks routinely occur in these databases. Testing applications to determine how they cause database deadlocks is important as part of ensuring correctness, reliability, and performance of these applications. Unfortunately, it is very difficult to reproduce database deadlocks, since it involves different factors such as the precise interleavings in executing SQL statements. We created a novel approach for Systematic TEsting in Presence of DAtabase Deadlocks (STEPDAD) that enables testers to instantiate database deadlocks in applications with a high level of automation and frequency. We implemented STEPDAD and experimented with three applications. On average, STEPDAD detected a number of database deadlocks exceeding the deadlocks obtained with the baseline approach by more than an order of magnitude. In some cases, STEPDAD reproduced a database deadlock after running an application only twice, while no database deadlocks could be obtained after ten runs using the baseline approach. The entire code, results, and video can be obtained from here. The Problem of Testing DCAs For Handling Database Deadlocks Many organizations and companies deploy Database-centric applications (DCAs), which use databases by sending transactions to them - atomic units of work that contain SQL statements - and obtaining data that result from execution of these SQL statements. When DCAs use the same database at the same time, concurrency errors are observed frequently, and these errors are known as database deadlocks, which is one of the reasons for major performance degradation in these applications. The responsibility of database engines is to provide layers of abstractions to guarantee Atomicity, Consistency, Isolation, and Durability (ACID) properties; however, these guarantees do not include freedom from database deadlocks. In general, deadlocks occur when two or more threads of execution lock some resources and wait on other resources in a circular chain, i.e., in a hold-and-wait cycle. Database deadlocks occur within database engines and not within DCAs that use these databases. A condition for observing database deadlock is that a database should simultaneously service two or more transactions that come from one or more DCAs, and these transactions contain SQL statements that share the same resources (e.g., tables or rows). In enterprise systems, database deadlocks may appear when a new transaction is issued by a DCA to a database that is already used by some other legacy DCA, thus making the process of software evolution error-prone, expensive, and difficult. Currently, database deadlocks are typically detected within database engines using special algorithms that analyze whether transactions hold resources in cyclic dependencies, and these database engines resolve database deadlocks by forcibly breaking the hold-and-wait cycle. That is, once a deadlock occurs, the database rolls back one or more of the transactions that is involved in the circular wait. Doing so effectively resolves the database deadlock, however, exceptions are thrown in the components of the DCAs that sent these aborted transactions. Essentially, these exceptions are notification mechanism to let stakeholders know that a business flow is disrupted, since some transactions did not go through. Next, stakeholders study causes of database deadlocks, so that they can avoid them. Programmers are advised to practise defensive programming by writing special database deadlock exception-handling code, for example, to repeat aborted transactions when applicable -- searching for ``database deadlock exception'' on the Web yields close to 2,500 web pages, many of which instruct programmers on how to handle database deadlock exceptions for different databases. However, this solution is considered a temporary patch, since it leads to significant performance degradation -- detecting a database deadlock, throwing an exception, and retrying a transaction comes at a high cost. Thus, it is important to analyze these exceptions to determine the causes of database deadlocks, so that they can be avoided altogether by refactoring DCAs. Different database deadlock avoidance programming patterns help database designers and programmers structure their code, transactions, and data so that they can avoid database deadlocks. For example, Microsoft published guidelines for minimizing database deadlocks in SQL Server. These guidelines include, among others, accessing database objects in the same order, avoiding user interaction in transactions, and keeping transactions short and in one batch. Since these solutions are manual and error-prone, it is important to reproduce database deadlocks using DCAs automatically during testing to see what avoidance patterns fit best. Consistently and systematically reproducing database deadlocks is very difficult and laborious, since identifying execution scenarios that lead to database deadlocks requires sophisticated reasoning about the combined behavior of DCAs and their databases. The result of this process is overwhelming complexity and a significant cost of reproducing database deadlocks. Our interviews with different Fortune 100 companies confirmed that database deadlocks occur on average every two to three weeks for large-scale enterprise DCAs, some of which have been around for over 20 years. For instance, database deadlocks still occur every ten days on average in a commercial large-scale DCA that handles over 70% of cargo flight reservations in the USA. In this case, a test engineer would wait for ten days in order to detect a single database deadlock, which is obviously impractical.. . We created a novel approach for Systematic TEsting in Presence of DAtabase Deadlocks (STEPDAD) that enables testers to instantiate database deadlocks in DCAs with a high level of automation and frequency. This paper makes the following contributions. This paper makes the following contributions. • We model transactions using lock graphs to detect hold-and-wait cycles in transactions. • STEPDAD exploits information about hold-and-wait cycles to explore interleavings of queries performed by different transactions using a technique called execution hijacking. These interleavings attempt to produce deadlocks matching the detected hold-and-wait cycles, thereby significantly increasing the probability of database deadlocks • We implemented STEPDAD and experimented using three client/server DCAs. On average STEPDAD produced a number of database deadlocks exceeding by more than an order of magnitude the number of database deadlocks obtained the baseline approach. In some cases, STEPDAD produced a database deadlock after running an application only two times, while no database deadlocks were produced after ten runs using the baseline approach. Example of a Database Deadlock Consider an example of database deadlock that is shown in the table below. Transactions T 1 and T 2 are independently sent by DCA(s) to the same database at the same time. When the first DCA executes the statement UPDATE that is shown in Step 1, the database locks rows in the table authors that contains values of the attribute paperid equal to 1. Next, the second DCA executes the statement UPDATE that is shown in Step 2, the database locks rows in the table titles that contains values of the attribute titleid equal to 2. When the statement SELECT is executed as part of the transaction T 1 as shown in Step 3, the database attempts to obtain a read lock on the rows of the table titles, which are exclusively locked by the transaction T 2 of the second DCA. Since these locks cannot be imposed simultaneously on the same resource (i.e., these locks are not compatible), T 1 is put on hold. Finally, the statement SELECT is executed as part of the transaction T 2 as shown in Step 4, the database attempts to obtain a read lock on the rows of the table authors, which are exclusively locked by the transaction T 1 of the first DCA. At this point both T 1 and T 2 are put on hold resulting in a database deadlock. The same reasoning applies if the granularity of locks is coarser, on the table level. \A lock graph is shown in the above figure for the these transactions that are depicted in rectangles and resources (i.e., tables) are shown in ovals. Thick arrows designate locks that transactions obtain on resources and dashed arrows designate that transactions are waiting to obtain locks on resources. The lock graph shows the cycle T 1 ->authors->T 2 ->titles->T 1 . This hold-and- wait cycle may not always materialize in a database deadlock; however, when interleaving steps occur as shown in the table, a database deadlock is highly likely. One exception is when these tables contain no data; in this case, locks may be released by the database engine almost instantly or not imposed at all. This situation illustrates how difficult it is to reproduce database deadlocks even in very simple cases. Downloads and Experimental Results To reproduce results of our experiments with REDACT, you need to download the following components. • All results of experimentation with STEPDAD systems with instructions inside the rar file - available here. • Subject applications for experimentation are available from Sourceforge here. • The experimental setup for the baseline experiment is available here with the documentation inside the package. • The experimental setup for the artificially injected transaction experiment is available here with the documentation inside the package. • The experimental setup for the controller-based experiment is available here with the documentation inside the package. People STEPDAD was created at the Advanced Research In Software Engineering (ARISE) lab at the Department of Computer Science of the University of Illinois at Chicago. Mark Grechanik, Project Lead Email: drmark[at]uic.edu B. M. Mainul Hossain Email: bhossa2[at]uic.edu Ugo Buy Email: buy[at]uic.edu