Homework & Quizzes
There will be several quizzes / homeworks during course.
Homework 1 - Get Acquainted with W3C Prov
The purpose of this homework is to get some practical experience with the W3C Prov standard.
Links
- the standard - the standard description
- ProvPrimer - basic introduction to Prov
- PROV tutorial - tutorial for using Prov
- Prov toolbox - a collection of tools (in Java) for dealing with Prov documents
- you can also use our docker image =iitdbgroup/provtutorial=
- Prov Checker - a checker for compliance of a Prov document with the standard specification
- https://openprovenance.org/ - online converter and validator
Task 0 - Read the Prov Primer and Work through the Prov Tutorial
Due: 09/12 Deliverable: nothing
Task 1 - Design a provenance graph
Due: 09/12 Deliverable: a Prov graph serialized as a JSON file (send via email)
In this task you should pick a simple example process and model it as a PROV-JSON document. Then it through Prov Checker to ensure standard compliance.
Task 2 - Design a fine-grained provenance graph for a query
Due: 09/12 Deliverable: a Prov graph serialized as a JSON file (send via email)
In this task you should create a provenance graph for a SQL query (if you do not have any background in SQL, then have a look at these slides. Consider the table of temperature probe readings storing for each measurement the probe that took the measurement, the time the measurement took place, and the measured temperature.
Probe | Time | Temperature |
---|---|---|
P1 | 11:00 | 45 |
P1 | 12:00 | 47 |
P1 | 1:00 | 54 |
P1 | 2:00 | 56 |
P2 | 11:00 | 47 |
P2 | 12:00 | 49 |
P2 | 1:00 | 52 |
P2 | 2:00 | 48 |
The following query is used to compute average temperatures between 11:00 am and 12:00pm for each sensor, returning the average for sensors where the minimum temperature is larger than a threshold.
SELECT Probe, avg(Temperature) AS avgtemp
FROM measurements
WHERE Time BETWEEN '11:00' AND '12:00'
GROUP BY Probe
HAVING min(Temperature) > 46;
Evaluated over the instance of the measurement
table shown above, this query returns:
Probe | avgtemp |
---|---|
P2 | 48 |
Create a Prov graph modeling this scenario. Entities should be tracked at the granularity of rows. Use collections to model which rows belong to a table.