Lecture Notes for cs581
Overview
Lecture
- course webpage: https://www.cs.uic.edu/~bglavic/cs581/2025-fall/
- syllabus: https://www.cs.uic.edu/~bglavic/cs581/2025-fall/syllabus/
- slides: https://www.cs.uic.edu/~bglavic/cs581/2025-fall/schedule/schedule/
- additional resources: https://www.cs.uic.edu/~bglavic/cs581/2025-fall/resources/
|
|
Lecture
Random vs. Sequential Access (HDD)
|
|
random access (ms per block): 19.206 sequential access (ms per block): 0.039 speed-up sequential: 492.462
Pipelining
columnar storage
|
|
row-based storage
|
|
Lecture
- branchy code in loops
|
|
|
|
|
|
Lecture
Lecture
- can use B-tree on
lastname(lexicographical)
|
|
- can't use B-tree
|
|
Lecture
Lecture
Lecture
Lecture
|
|
- associativity and commutativity
Lecture
|
|
|
|
|
|
|
|
|
|
|
|
| AND | FALSE | TRUE | NULL |
|---|---|---|---|
| FALSE | FALSE | FALSE | FALSE |
| TRUE | TRUE | NULL | |
| NULL | NULL |
| OR | FALSE | TRUE | NULL |
|---|---|---|---|
| FALSE | FALSE | TRUE | NULL |
| TRUE | TRUE | TRUE | |
| NULL | NULL |
|
|
Lecture
- window functions
|
|
Lecture
|
|
\[ \sigma_{A=3 \vee A=4}(R) = \sigma_{A=3}(R) \cup \sigma_{A=4}(R) \]
Lecture
Lecture
S = { R_1, R_4, R_6 }
O = { R_4 } S / O = { R_1, R_6 }
|
|
Number of plans Dynamic Programming: 1.2157665459056929e+19 Greedy Join enumeration: 64000.0
Lecture
Merging subqueries (subquery pull-up of SPJ queries)
|
|
- pullup subqueries
|
|
-
what about side effects (we have to be careful as we may change the number of times the function is executed)
1 2 3SELECT c * c AS D FROM (SELECT send_email(a,b) AS c, e FROM R(a,b,e)) sub - in worst case, we get an exponential blowup
|
|
|
|
|
|
|
|
|
|
|
|
Lack of cross optimization between UDFs written in a procedural language or procedural extension of SQL and the remaining of a query
-
in general this is hard as in general optimizing beyond the "function boundary" would require lifting the function to the declarative / batch level of relational operations (such as joins) which is akin to automatic parallelization of programs (an undecidable computational problem)
- the net result is that functions are "optimization boundaries", e.g., if your UDF runs a query with a join and you call your UDF from another query with joins, there will be no join reordering between these two
-
there are some research prototypes that translate procedural code into declarative SQL:
- Froid: Optimization of Imperative Programs in a Relational Database. Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson, César A. Galindo-Legaria, Conor Cunningham. PVLDB, vol. 11 (4), 432–444, 2017.
- Blackmagic: Automatic Inlining of Scalar Udfs into SQL Queries with Froid. Karthik Ramachandra, Kwanghyun Park. PVLDB, vol. 12 (12), 1810–1813, 2019.
- One with RECURSIVE Is Worth Many GOTOs. Denis Hirn, Torsten Grust. SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pp. 723–735 (2021).
- Dear User-Defined Functions, Inlining Isn'T Working out so Great for Us. Let'S Try Batching to Make Our Relationship Work. Sincerely, SQL. Kai Franz, Samuel Arch, Denis Hirn, Torsten Grust, Todd C. Mowry, Andrew Pavlo. 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, HI, USA, January 14-17, 2024 (2024).
Lecture
Rewrite argmax using window functions
|
|
|
|
Enhanced Subquery Optimizations in Oracle. Srikanth Bellamkonda, Rafi Ahmed, Andrew Witkowski, Angela Amor, Mohamed Zait, Chun-Chieh Lin. Proceedings of the VLDB Endowment, vol. 2 (2), 1366–1377, 2009.