2025-02-24 - Boris attending Dagstuhl seminar
Dagstuhl seminar on Semirings in Databases, Automata, and Logic
Last week Boris attended a seminar at Dagstuhl on Semirings in Databases, Automata, and Logic to learn more about how semirings are applied in logic, automata theory, and of course databases. Boris gave a talk on our recent work [1] about using abstract interpretation to over-approximate the space of possible models and predictions based on considering all possible clean versions of a dirty training dataset in machine learning and its relationship to our prior work of under-approximating certain answers and over-approximating possible answers for queries over uncertain data [2][3][4]. This is joint work with Jiongli and Babak from UCSD, former DBGroup member Su, and Oliver from University at Buffalo.
Impressions
Thanks to the organizers for putting together this great workshop!
-
Learning from Uncertain Data: From Possible Worlds to Possible Models
Jiongli Zhu, Su Feng, Boris Glavic and Babak Salimi
Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 (2024).@inproceedings{ZF24, author = {Zhu, Jiongli and Feng, Su and Glavic, Boris and Salimi, Babak}, title = {Learning from Uncertain Data: From Possible Worlds to Possible Models}, pdfurl = {http://www.cs.uic.edu/%7ebglavic/dbgroup/assets/pdfpubls/ZF24.pdf}, longversionurl = {https://arxiv.org/pdf/2405.18549}, booktitle = {Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024}, year = {2024}, url = {http://papers.nips.cc/paper\_files/paper/2024/hash/c17fab1bcef325d0d30989c9bf506c0b-Abstract-Conference.html}, editor = {Globersons, Amir and Mackey, Lester and Belgrave, Danielle and Fan, Angela and Paquet, Ulrich and Tomczak, Jakub M. and Zhang, Cheng}, projects = {UA-DB; Zorro}, venueshort = {{NeurIPS}}, keywords = {Uncertainty; Machine Learning} }
-
CaJaDE: Explaining Query Results by Augmenting Provenance with Context
Chenjie Li, Juseung Lee, Zhengjie Miao, Boris Glavic and Sudeepa Roy
Proceedings of the VLDB Endowment (Demonstration Track). 15, 12 (2022) , 3594–3597.@article{LL22a, author = {Li, Chenjie and Lee, Juseung and Miao, Zhengjie and Glavic, Boris and Roy, Sudeepa}, keywords = {Provenance; Explanations}, title = {CaJaDE: Explaining Query Results by Augmenting Provenance with Context}, journal = {Proceedings of the VLDB Endowment (Demonstration Track)}, projects = {Explanations beyond Provenance}, pdfurl = {http://www.cs.uic.edu/%7ebglavic/dbgroup/assets/pdfpubls/LL22.pdf}, volume = {15}, number = {12}, pages = {3594--3597}, doi = {10.14778/3554821.3554852}, year = {2022}, venueshort = {{PVLDB}} }
-
Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds
Su Feng, Aaron Huber, Boris Glavic and Oliver Kennedy
Proceedings of the 46th International Conference on Management of Data (2021), pp. 528–540.@inproceedings{FH21, author = {Feng, Su and Huber, Aaron and Glavic, Boris and Kennedy, Oliver}, booktitle = {Proceedings of the 46th International Conference on Management of Data}, keywords = {UA-DB; Vizier}, pages = {528 – 540}, doi = {10.1145/3448016.3452791}, pdfurl = {https://dl.acm.org/doi/pdf/10.1145/3448016.3452791}, projects = {Vizier; UA-DB}, video = {https://www.youtube.com/watch?v=si2HUS7idEs&list=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq}, title = {Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds}, venueshort = {SIGMOD}, reproducibility = {https://github.com/fengsu91/AUDB_Reproducibility}, longversionurl = {https://arxiv.org/pdf/2102.11796}, year = {2021} }
Incomplete and probabilistic database techniques are principled methods for coping with uncertainty in data. Unfortunately, the class of queries that can be answered efficiently over such databases is severely limited, even when advanced approximation techniques are employed.We introduce attribute-annotated uncertain databases (AU-DBs), an uncertain data model that annotates tuples and attribute values with bounds to compactly approximate an incomplete database. AU-DBs are closed under relational algebra with aggregation using an efficient evaluation semantics. Using optimizations that trade accuracy for performance, our approach scales to complex queries and large datasets, and produces accurate results.
-
Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers
Su Feng, Aaron Huber, Boris Glavic and Oliver Kennedy
Proceedings of the 44th International Conference on Management of Data (2019), pp. 1313–1330.@inproceedings{FH19, author = {Feng, Su and Huber, Aaron and Glavic, Boris and Kennedy, Oliver}, booktitle = {Proceedings of the 44th International Conference on Management of Data}, keywords = {UA-DB; Vizier}, longversionurl = {https://arxiv.org/pdf/1904.00234}, pages = {1313-1330}, pdfurl = {http://www.cs.uic.edu/%7ebglavic/dbgroup/assets/pdfpubls/FH19.pdf}, reproducibility = {https://github.com/UICDBGroup/UADB_Reproducibility}, projects = {Vizier; UA-DB}, video = {https://av.tib.eu/media/43062}, doi = {10.1145/3299869.3319887}, slideurl = {https://www.slideshare.net/lordPretzel/2019-sigmod-uncertainty-annotated-databases-a-lightweight-approach-for-approximating-certain-answers}, title = {Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers}, venueshort = {SIGMOD}, year = {2019} }
Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notion of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility.