UIC Database Group

Header bar

2025-06-04 - Paper accepted at EDBT 2026

Incremental Maintenance of Provenance Sketches Under Updates

Congrats to Pengyuan on his accepted EDBT 2026 paper. In this work [1] in collaboration with Dieter, Vasudha, Zhen, and Danica, we develop an in-memory engine for incremental maintenance of provenance sketches.

Incremental Maintenance for Provenance-based Data Skipping

Provenance-based data skipping [2] compactly over-approximates the provenance of a query using provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a sketch captured at some time in the past may become stale if the data has been updated subsequently. Thus, there is a need to maintain provenance sketches. In this work, we introduce In-Memory incremental Maintenance of Provenance sketches (IMP), a framework for maintaining sketches incrementally under updates. At the core of IMP is an incremental query engine for data annotated with sketches that exploits the coarse-grained nature of sketches to enable novel optimizations.

Relevance-based Data Management

This work is part of our group’s work on Relevance-based Data Management where we utilize information on what data is relevant for answering a query to improve a wide range of data management tasks. A fundamental idea in relevance-based data management is to use dynamic relevance analysis techniques that determine at runtime what data is needed for answering a query. Dynamic relevance analysis is much more powerful than the static relevance analysis a database uses to determine what data can be pruned based on selection conditions. For instance, it enables us to determine what data is needed for answering top-k and aggregation queries with HAVING where the relevance of input data can only be determined at runtime. For example, consider the following query that returns the three departments with the most employees.

SELECT dept, count(*) AS numemp
FROM employee
GROUP BY dept
ORDER BY numemp DESC
LIMIT 3

Only data from the three departments with the most employees is needed to answer this query, but the identity of these departments will only be discovered at runtime!

  1. In-memory Incremental Maintenance of Provenance Sketches
    Pengyuan Li, Boris Glavic, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu, Danica Porobic and Xing Niu
    Proceedings of the International Conference on Extending Database Technology (2026).
    details
  2. Provenance-based Data Skipping
    Xing Niu, Ziyu Liu, Pengyuan Li, Boris Glavic, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu and Danica Porobic
    Proceedings of the VLDB Endowment. 15, 3 (2021) , 451–464.
    details