Overview

During the course you will read and summarize several research papers covering state-of-the-art techniques in Big Data processing. Each student will present one of these papers in class. Papers that are required reading have to be read by all students in the class.

For your presentation, you can select any paper.

Please select a paper until 09/12.

The papers are available through google drive

Presentation and Report

Please prepare a 20-25 minute talk with slides to present the paper you have been assigned. The whole presentation including Q&A should be 30-35 minutes. Furthermore, you need to write a report explaining and criticizing the presented techniques.

The schedule for presentations is shown below.

A full draft of the report is due on 12/02.

The report is due on 12/11.

Help for writing the report, preparing slides, and giving a talk

How to give a presentation and prepare slides:

How to write a scientific article:

  • Page on how to write an CS article. Also comments on some general writing rules.
  • Simon Peyton Jones slides and video on how to write a great research paper

Presentation Schedule

The presentation schedule will be announced once papers have been assigned.

Student Paper Presentation Date Slides
Leonardo Borgioli 10/15 Falling Rule Lists pdf
Karthik Ragi 10/17 Why Should I Trust You?: Explaining the Predictions of Any Classifier pdf
Sri Keshav Katragadda 10/22 The W3C PROV Family of Specifications for Modelling Provenance Metadata pdf
Gustavo Moreira 10/24 Noworkflow: A Tool for Collecting, Analyzing, and Managing Provenance from Python Scripts pdf
Chenjie Li 10/29 Causality-Based Explanation of Classification Outcomes pdf
Huy Truong 10/31 A Unified Approach to Interpreting Model Predictions pdf
Revathi Dhotre 11/05 Summarizing Provenance of Aggregate Query Results in Relational Databases pdf
Amy Byrnes 11/12 HypeR: Hypothetical Reasoning with What-If and How-to Queries Using a Probabilistic Causal Approach pdf
Jyotsna Rajaraman 11/14 Complaint-Driven Training Data Debugging for Query 2.0  

List of Papers

Provenance Models

  • The Semiring Framework for Database Provenance, Todd J Green, Val Tannen, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 93–99, 2017

  • The W3C PROV Family of Specifications for Modelling Provenance Metadata, Paolo Missier, Khalid Belhajjame, James Cheney, Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776, 2013

Provenance-Aware Systems

  • GProM - a Swiss Army Knife for Your Provenance Needs, Bahareh Arab, Su Feng, Boris Glavic, Seokki Lee, Xing Niu, Qitian Zeng, IEEE Data Engineering Bulletin41 (1), 51–62, 2018

  • Smoke: Fine-Grained Lineage at Interactive Speed, Fotis Psallidas, Eugene Wu, Proc. VLDB Endow.11 (6), 719–732, 2018

  • You Say ‘What’, I Hear ‘Where’ and ‘Why’ - (Mis-) Interpreting SQL to Derive Fine-Grained Provenance, Tobias Müller, Benjamin Dietrich, Torsten Grust, Proceedings of the VLDB Endowment, 11 (11), 2018

  • Noworkflow: A Tool for Collecting, Analyzing, and Managing Provenance from Python Scripts, João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, Juliana Freire, Proc. VLDB Endow.10 (12), 1841–1844, 2017

  • Fine-Grained Lineage for Safer Notebook Interactions, Stephen Macke, Aditya G. Parameswaran, Hongpu Gong, Doris Jung Lin Lee, Doris Xin, Andrew Head, Proc. VLDB Endow.14 (6), 1093–1101, 2021

Explainable Machine Learning Models

  • Why Should I Trust You?: Explaining the Predictions of Any Classifier, Marco Túlio Ribeiro, Sameer Singh, Carlos Guestrin, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1135–1144, 2016

  • Falling Rule Lists, Fulton Wang, Cynthia Rudin, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015, San Diego, California, USA, May 9-12, 2015, 2015

Attribution and Intervention-based Explanations for Machine Learning

  • Causality-Based Explanation of Classification Outcomes, Leopoldo E. Bertossi, Jordan Li, Maximilian Schleich, Dan Suciu, Zografoula Vagena, Proceedings of the Fourth Workshop on Data Management for End-To-End Machine Learning, In conjunction with the 2020 ACM SIGMOD/PODS Conference, DEEM@SIGMOD 2020, Portland, OR, USA, June 14, 2020, pp. 6:1–6:10, 2020

  • A Unified Approach to Interpreting Model Predictions, Scott M. Lundberg, Su-In Lee, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 4765–4774, 2017

  • Complaint-Driven Training Data Debugging for Query 2.0, Weiyuan Wu, Lampros Flokas, Eugene Wu, Jiannan Wang, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pp. 1317–1334, 2020

Explanations as Provenance Summarization

  • Summarizing Provenance of Aggregate Query Results in Relational Databases, Omar AlOmeir, Eugenie Yujing Lai, Mostafa Milani, Rachel Pottinger, 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, April 19-22, 2021, pp. 1955–1960, 2021

  • A Formal Approach to Finding Explanations for Database Queries, Sudeepa Roy, Dan Suciu, SIGMOD, pp. 1579–1590, 2014

  • Approximate Summaries for Why and Why-Not Provenance, Seokki Lee, Bertram Ludäscher, Boris Glavic, Proceedings of the VLDB Endowment 13 (6), 912 - 924, 2020

  • Putting Things into Context: Rich Explanations for Query Answers Using Join Graphs, Chenjie Li, Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy, Proceedings of the 46th International Conference on Management of Data, pp. 1051–1063, 2021

Hypothetical Reasoning: What-if and How-to

  • Tiresias: The Database Oracle for How-to Queries, A. Meliou, D. Suciu, Proceedings of the 2012 international conference on Management of Data, pp. 337–348, 2012

  • HypeR: Hypothetical Reasoning with What-If and How-to Queries Using a Probabilistic Causal Approach, Sainyam Galhotra, Amir Gilad, Sudeepa Roy, Babak Salimi, SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pp. 1598–1611, 2022

  • Efficient Answering of Historical What-If Queries, Felix Campbell, Bahareh Arab, Boris Glavic, Proceedings of the 48th International Conference on Management of Data, pp. 1556–1569, 2022

  • Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines, Stefan Grafberger, Paul Groth, Sebastian Schelter, Proc. ACM Manag. Data1 (2), 128:1–128:26, 2023

Attribution and Intervention-based Explanations for Databases

  • Computing the Shapley Value of Facts in Query Answering, Daniel Deutch, Nave Frost, Benny Kimelfeld, Mikaël Monet, SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pp. 1570–1583, 2022

  • The Shapley Value in Database Management, Leopoldo E. Bertossi, Benny Kimelfeld, Ester Livshits, Mikaël Monet, SIGMOD Rec.52 (2), 6–17, 2023

  • ShapGraph: An Holistic View of Explanations through Provenance Graphs and Shapley Values, Susan B. Davidson, Daniel Deutch, Nave Frost, Benny Kimelfeld, Omer Koren, Mikaël Monet, SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pp. 2373–2376, 2022

  • Tracing Data Errors with View-Conditioned Causality, Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, Dan Suciu, SIGMOD Conference, pp. 505-516, 2011