Syllabus

Provenance and explanations are essential tools for building trust-worthy, secure, transparent, and fair data-intensive systems and machine learning pipelines. These tools are used to debug analysis results, to comprehend the results of complex queries, to explore the impact of hypothetical changes to data and/or policies, to audit sensitive computations, and to justify and understand predictions made by machine learning models. This course provides a comprehensive overview of algorithms, systems, and techniques for capturing & managing data provenance, i.e., tracking the origin and creation process of data, as well as for generating explanations for data-intensive computations such as declarative queries and machine learning.

The goal of this course is to provide students with the necessary tools to build provenance-enabled systems and develop automated solutions for generating explanations.

Course Topics

The following topics will be covered in the course:

  • Provenance & Explanations - Introduction
    • Motivation & use cases
    • Provenance graphs
    • Explanations for query answers
  • Provenance models
  • Hypothetical reasoning: what-if and how-to
    • Incremental view maintenance / what-if queries
    • View update & how-to
  • Explanations
    • Counterfactual explanations
    • Explanations as (provenance) summarization
    • Attribution and degrees of responsibility (including game theoretic notions of attribution)
  • Explaining missing answers
  • Provenance capture & management
    • How to compute provenance efficiently?
    • Storage and computation trade-offs
  • Building provenance-aware & explanation-ready systems
    • Strategies for capturing and managing provenance
    • How to compute explanations efficiently?

Course Organization

Materials

The following overview articles and textbooks will be helpful, but are optional.

Data Provenance - Origins, Applications, Algorithms, and Models., Boris Glavic. Foundations and Trends® in Databases, vol. 9 (3-4), 209-441, 2021.

Trends in Explanations: Understanding and Debugging Data-Driven Systems., Boris Glavic, Alexandra Meliou, Sudeepa Roy. Foundations and Trends® in Databases, vol. 11 (3), 226-318, 2021.

Principles of Data Integration, 1th Edition, Doan, Halevy, and Ives, Morgan Kaufmann, 2012

Depending on your background, a standard database textbook may be useful:

Elmasri and Navathe. Fundamentals of Database Systems, 6th Edition, Addison-Wesley, 2003

Ramakrishnan and Gehrke. Database Management Systems, 3nd Edition, McGraw-Hill, 2002

Silberschatz, Korth, and Sudarshan. Database System Concepts, 6th Edition, McGraw Hill, 2010

Garcia-Molina, Ullman, and Widom. Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2008

Please protect the copyright integrity of all course materials and content. Please do not upload course materials not created by you onto third-party websites or share content with anyone not enrolled in our course.

Grading

  • Project: 40%
  • Paper review and presentation: 40%
  • Homework assignment & Quizzes: 10%
  • Active participation in class: 10%

Class organization

  • The first half of the class will consist mostly of lectures given by the instructor to introduce students to necessary background in databases, provenance, and explanations.
  • In the second half, students will read and present research papers related to the topics covered in the course.
  • After the first few classes, students will have to decide on a research project. In this project, students will either implement and/or evaluate an existing techniques from a state-of-the-art research paper or work on novel research. The results of these project will be towards the end of the semester. The instructor will provide guidance to students that are interested in publishing their work developed in this course where appropriate.

Workload

In this coursed, students will …

  1. Work on a semester-long research project related to implementing provenance or explanation techniques based on a research paper or working on developing new techniques.
  2. Review and present a state-of-the-art research paper from the field.
  3. Actively participate in class
  4. Homework assignments / quizzes

Prerequisites

No formal prerequisites, but some background in databases (roughly equivalent to CS480) is expected.

Overview

This course teaches you about systems, algorithms, and the fundamental principles that enable distributed analysis of very large datasets using high-level languages, i.e., Modern Big Data Analytics.

  • Lecture: overview of content covered in the lectures: here
  • Project: information about the project: here
  • Literature review: information about the literature review: here

Important Dates

  • Select a paper to review: 09/12
  • Submit a full draft of the report: 12/02
  • Submit the report review report: 12/11
  • Select a project: 09/22
  • Meet to discuss project design: 10/15
  • Meet to discuss progress and final steps: week of 11/11
  • Finish project implementation: 12/03 and 12/05

Workload and Grading Scheme

Grading Policy:

Grading scheme:

  • 80+ = A
  • 50+ = B
  • 35+ = C
  • <35 = E

Policy for Missed or Late Work

  • Late assignments: Late assignments will not, in general, be accepted. They will never be accepted if the student has not made special arrangements with me at least one day before the assignment is due. If a late assignment is accepted it is subject to a reduction in score as a late penalty.

  • Incompletes: The UIC Undergraduate catalog states that in addition to needing excellent justification for an incomplete, a student must also have been “making satisfactory progress” in the course.

  • Statute of limitations: No grading questions or complaints, no matter how justified, will be listened to one week after the item in question has been returned.

  • Cheating: Cheating will not be tolerated. All work you submitted must be entirely your own. Any suspicious similarities between students’ work (this includes homework and exams) will be recorded and brought to the attention of the Dean. The MINIMUM penalty for any student found cheating will be to receive a 0 for the item in question, and dropping your final course grade one letter. The MAXIMUM penalty will be expulsion from the University.

  • Classroom Conduct: Classroom discussions and questions are a valuable part of the learning process and are encouraged. However, students who repeatedly talk among themselves disrupting the class lecture will be asked to leave.

Attendance / Participation Policy

Please email me if you face an unexpected situation that may impede your attendance, participation in required class and exam sessions, or timely completion of assignments.

Other Course Policies

Academic Integrity

UIC is an academic community committed to providing an environment in which research, learning, and scholarship can flourish and in which all endeavors are guided by academic and professional integrity. In this community, all members including faculty, administrators, staff, and students alike share the responsibility to uphold the highest standards of academic honesty and quality of academic work so that such a collegial and productive environment exists. As a student and member of the UIC community, you are expected to adhere to the Community Standards of integrity, accountability, and respect in all of your academic endeavors. When accusations of academic dishonesty occur, the Office of the Dean of Students investigates and adjudicates suspected violations of this student code. Unacceptable behavior includes cheating, unauthorized collaboration, fabrication or falsification, plagiarism, multiple submissions without instructor permission, using unauthorized study aids, coercion regarding grading or evaluation of coursework, and facilitating academic misconduct. Please review the UIC Student Disciplinary Policy for additional information about the process by which instances of academic misconduct are handled towards the goal of developing responsible student behavior.

By submitting your assignments for grading you acknowledge these terms, you declare that your work is solely your own, and you promise that, unless authorized by the instructor or proctor, you have not communicated with anyone in any way during an exam or other online assessment, and that submitted work is your own unless the assignment explicitly is intended to be done in a group. You are NOT ALLOWED to use Generative AI and Large Language Models (such as ChatGPT).

We use an automatic cheating-verification program that is capable of detecting partial logical similarities of answers. Don’t even take the risk! Do NOT use or post to Chegg or any similar Website.

Plagiarism is a serious matter and will be treated as such. You may be penalized by failing the course. Any student caught will have a grade of zero on the assignment, will have a drop in their letter grade at the end of the semester, and will be reported to the Dean of Students. This applies to each offense. Details on this are given on the Academic Integrity page (https://dos.uic.edu/community-standards/academic-integrity/) and you can view the Student Disciplinary Policy at this link: https://dos.uic.edu/wp-content/uploads/sites/262/2018/10/DOS-Student-Disciplinary-Policy-2018-2019-FINAL.pdf).

We believe that each and every one of you is able to learn the material and complete assignments on your own! If you do find yourself struggling, please reach out to any of us on the instructional staff and ask for help rather than simply turning in work that you did not complete yourself. It is far better to earn a low score on the assignment than to have an Academic Integrity violation on your record, especially towards the beginning of your academic Career.

Email Expectations

Students are responsible for all information instructors send to your UIC email and Blackboard accounts. Faculty messages should be regularly monitored and read in a timely fashion. All critical announcements, changes to assignments, etc. will be announced with Blackboard announcement. We are assuming that you check your email regularly, at the very least once every 24 hours.

Accommodations

Disability Accommodation Procedures

UIC is committed to full inclusion and participation of people with disabilities in all aspects of university life. If you face or anticipate disability-related barriers while at UIC, please connect with the Disability Resource Center (DRC) at drc.uic.edu, via email at drc@uic.edu, or call <tel:312-413-2183> to create a plan for reasonable accommodations. To receive accommodations, you will need to disclose the disability to the DRC, complete an interactive registration process with the DRC, and provide me with a Letter of Accommodation (LOA). Upon receipt of an LOA, I will gladly work with you and the DRC to implement approved accommodations.

Religious Accommodations

Following campus policy, if you wish to observe religious holidays, you must notify me by the tenth day of the semester. If the religious holiday is observed on or before the tenth day of the semester, you must notify me at least five days before you will be absent. Please submit this form by email with the subject heading: “YOUR NAME: Requesting Religious Accommodation.”

Classroom Environment

Inclusive Community

UIC values diversity and inclusion. Regardless of age, disability, ethnicity, race, gender, gender identity, sexual orientation, socioeconomic status, geographic background, religion, political ideology, language, or culture, we expect all members of this class to contribute to a respectful, welcoming, and inclusive environment for every other member of our class. If aspects of this course result in barriers to your inclusion, engagement, accurate assessment, or achievement, please notify me as soon as possible.

Name and Pronoun Use

If your name does not match the name on my class roster, please let me know as soon as possible. My pronouns are he/him. I welcome your pronouns if you would like to share them with me. For more information about pronouns, see this page: https://www.mypronouns.org/what-and-why.

Community Agreement/Classroom Conduct Policy

  • Be present by turning off cell phones and removing yourself from other distractions.
  • Be respectful of the learning space and community. For example, no side conversations or unnecessary disruptions.
  • Use preferred names and gender pronouns.
  • Assume goodwill in all interactions, even in disagreement.
  • Facilitate dialogue and value the free and safe exchange of ideas.
  • Try not to make assumptions, have an open mind, seek to understand, and not judge.
  • Approach discussion, challenges, and different perspectives as an opportunity to “think out loud,” learn something new, and understand the concepts or experiences that guide other

people’s thinking.

  • Debate the concepts, not the person.
  • Be gracious and open to change when your ideas, arguments, or positions do not work or are proven wrong.
  • Be willing to work together and share helpful study strategies.
  • Be mindful of one another’s privacy, and do not invite outsiders into our classroom.

Content Notices and Trigger Warnings

Our classroom provides an open space for a critical and civil exchange of ideas, inclusive of a variety of perspectives and positions. Some readings and other content may expose you to ideas, subjects, or views that may challenge you, cause you discomfort, or recall past negative experiences or traumas. I intend to discuss all subjects with dignity and humanity, as well as with rigor and respect for scholarly inquiry. If you would like me to be aware of a specific topic of concern, please email or visit my Student Drop-In Hours.

Resources for Academic Success, Wellness, and Safety

We all need the help and the support of our UIC community. Please visit my office hours for course consultation and other academic or research topics. For additional assistance, please contact your assigned college advisor and visit the support services available to all UIC students.

Academic Success

Wellness

  • Counseling Services: You may seek free and confidential services from the Counseling Center at https://counseling.uic.edu/.
  • Access U&I Care Program for assistance with personal hardships.
  • Campus Advocacy Network: Under Title IX, you have the right to an education free from any form of gender-based violence or discrimination. To make a report, email TitleIX@uic.edu. For more information or confidential victim services and advocacy, visit UIC’s Campus Advocacy Network at http://can.uic.edu/.

Safety

Syllabus Revisions

The standards and requirements set forth in this syllabus may be modified at any time by the course instructor. Notice of such changes will be by Canvas announcement or email notice.

Disclaimer

This syllabus is intended to give the student guidance in what may be covered during the semester and will be followed as closely as possible. All differences between information in this syllabus and the official UIC academic calendar should be resolved in favor of the calendar. However, as instructors, we reserve the right to modify, supplement, and make changes as course needs arise. We will communicate such changes in advance as Blackboard announcements.