cs581 - Database Management Systems - 2025 fall

Course webpage for cs581 - 2025 fall taught by Boris Glavic

Lecture Notes for cs581

Lecture <2025-08-25 Mon>

1
2
3
SELECT s.name
FROM students s, registration r
WHERE s.uin = r.uin AND r.coursenr = 'cs581';

Lecture <2025-08-27 Wed>

Random vs. Sequential Access (HDD)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# DISK STATS
rpm = 7200
avg_seektime_ms = 15
transfer_rate_sec_mb = 100
block_size_byte = (1024 * 4)  # 4KB blocks

# DERIVED STATS
avg_rotational_delay_ms = 0.5 * (float(1) / (rpm / (60 * 1000))) # translate rpm into
block_size_mb = float(block_size_byte) / float(1024 * 1024)
transfer_rate_ms_mb = transfer_rate_sec_mb / 1000
transfer_one_block_ms = block_size_mb / transfer_rate_ms_mb

# RANDOM AND SEQUENTIAL ACCESS TIMES
random_access_block = round(avg_seektime_ms + avg_rotational_delay_ms + transfer_one_block_ms,3)
sequential_access_block = round(transfer_one_block_ms,3)
speed_up = round(random_access_block / sequential_access_block,3)

print(f"random access (ms per block):        {random_access_block:9.3f}\nsequential access (ms per block):    {sequential_access_block:9.3f}\nspeed-up sequential:                 {speed_up:9.3f}")
random access (ms per block):           19.206
sequential access (ms per block):        0.039
speed-up sequential:                   492.462

Pipelining

columnar storage

1
2
3
4
5
6
int a[1024 * 1024];
int sum = 0;
for(int i = 0; i++; i < 1024 * 1024)
{
    sum += a[i];
}

row-based storage

1
2
3
4
5
6
int* a[1024 * 1024];
int columnpos = 3;
for(int i = 0; i++; i < 1024 * 1024)
{
    sum += a[i][columnpos];
}

Lecture <2025-09-03 Wed>

  • branchy code in loops
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for(int i = 0; i < 10000; i++)
{
    if (colchunk[i] < 3)
    {
        colchunk[i] += 3;
    }
    else
    {
        colchunk[i] -= 2;
    }
}
1
2
3
4
for(int i = 0; i < 10000; i++)
{
    colchunk[i] += (colchunk[i] < 3) * 3 + (colchunk[i] >= 3) * -2;
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
struct A {
  int a;
  char b;
};

struct B {
  int c;
  char d;
  int e;
};

Lecture <2025-09-08 Mon>

Lecture <2025-09-15 Mon>

  • can use B-tree on lastname (lexicographical)
1
2
3
SELECT *
FROM student
WHERE lastname LIKE 'A%';
  • can't use B-tree
1
2
3
SELECT *
FROM student
WHERE lastname LIKE '%b%';

Lecture <2025-09-24 Wed>

Lecture <2025-09-29 Mon>

Lecture <2025-10-01 Wed>

Lecture <2025-10-08 Wed>

1
2
s = 3 + 4 + 5 + 6 + 7 + 8
s2 = (3 + 4 + 5) + (6 + 7 + 8)
  • associativity and commutativity

Lecture <2025-10-13 Mon>

1
2
3
4
5
CREATE VIEW cntstd AS (
   SELECT count(*) AS numstd, major
   FROM students
   GROUP BY major
);
1
SELECT * FROM cntstd;
1
2
3
4
5
SELECT * FROM (
   SELECT count(*) AS numstd, major
   FROM students
   GROUP BY major
) cntstd;
1
2
3
4
SELECT name
FROM (SELECT send_email(email, 'Hey Student, ...'), name
      FROM student)
WHERE name LIKE 'A%';
1
SELECT sum(common_gpa()) AS avg_gpa FROM students HAVING acg_gpa > 4.5;
1
SELECT NULL = 3 OR 3 = 3
AND FALSE TRUE NULL
FALSE FALSE FALSE FALSE
TRUE TRUE NULL
NULL NULL
OR FALSE TRUE NULL
FALSE FALSE TRUE NULL
TRUE TRUE TRUE
NULL NULL
1
2
3
SELECT A + NULL

SELECT sum(A)

Lecture <2025-10-15 Wed>

  • window functions
1
2
3
4
WITH RECURSIVE setstuff (a,b) AS (
  SELECT * FROM edge
  UNION
  SELECT ...);

Lecture <2025-10-22 Wed>

1
SELECT 1 AS tester, * FROM R

\[ \sigma_{A=3 \vee A=4}(R) = \sigma_{A=3}(R) \cup \sigma_{A=4}(R) \]

Lecture <2025-10-27 Mon>

Lecture <2025-10-29 Wed>

S = { R_1, R_4, R_6 }

O = { R_4 } S / O = { R_1, R_6 }

1
2
3
4
5
6
from math import pow

numtables = 40
print("Number of plans")
print(f"Dynamic Programming: {pow(3,numtables)}")
print(f"Greedy Join enumeration: {pow(numtables,3)}")
Number of plans
Dynamic Programming: 1.2157665459056929e+19
Greedy Join enumeration: 64000.0

Lecture <2025-11-03 Mon>

Merging subqueries (subquery pull-up of SPJ queries)

1
2
3
SELECT c * c AS D
FROM (SELECT a + b AS c, e
      FROM R(a,b,e)) sub
  • pullup subqueries
1
2
SELECT (a+b) * (a+b) AS D
FROM R
  • what about side effects (we have to be careful as we may change the number of times the function is executed)

    1
    2
    3
    
    SELECT c * c AS D
    FROM (SELECT send_email(a,b) AS c, e
        FROM R(a,b,e)) sub
  • in worst case, we get an exponential blowup
1
2
3
4
SELECT an-1 * an-1 AS an
FROM (SELECT an-2 * an-2 AS an-1
     ...
        SELECT a1 * a1 AS a2 FROM R
1
2
3
4
5
6
CREATE FUNCTION f(x,y) {
   RETURN x + y
};

SELECT f(a,c)
FROM R;
1
2
SELECT fresult
FROM R, LATERAL (SELECT a + c AS fresult) AS f;
1
2
SELECT a + c AS fresult
FROM R;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE FUNCTION divide(x,y) AS (
  IF y = 0
  THEN
     RETURN NULL
  ELSE
     RETURN x / y;
)

-- ->
SELECT CASE WHEN y = 0 THEN NULL ELSE x / y;
1
2
3
CREATE FUNction
   FOR x in (SELECT * FROM ..)
      loop_body

Lack of cross optimization between UDFs written in a procedural language or procedural extension of SQL and the remaining of a query

  • in general this is hard as in general optimizing beyond the "function boundary" would require lifting the function to the declarative / batch level of relational operations (such as joins) which is akin to automatic parallelization of programs (an undecidable computational problem)

    • the net result is that functions are "optimization boundaries", e.g., if your UDF runs a query with a join and you call your UDF from another query with joins, there will be no join reordering between these two
  • there are some research prototypes that translate procedural code into declarative SQL:

    • Froid: Optimization of Imperative Programs in a Relational Database. Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson, César A. Galindo-Legaria, Conor Cunningham. PVLDB, vol. 11 (4), 432–444, 2017.
    • Blackmagic: Automatic Inlining of Scalar Udfs into SQL Queries with Froid. Karthik Ramachandra, Kwanghyun Park. PVLDB, vol. 12 (12), 1810–1813, 2019.
    • One with RECURSIVE Is Worth Many GOTOs. Denis Hirn, Torsten Grust. SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pp. 723–735 (2021).
    • Dear User-Defined Functions, Inlining Isn'T Working out so Great for Us. Let'S Try Batching to Make Our Relationship Work. Sincerely, SQL. Kai Franz, Samuel Arch, Denis Hirn, Torsten Grust, Todd C. Mowry, Andrew Pavlo. 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, HI, USA, January 14-17, 2024 (2024).

Lecture <2025-11-05 Wed>

Rewrite argmax using window functions

1
2
3
4
5
6
7
SELECT name
FROM orders o,
    (SELECT max(amount) AS ma, i.cust
    FROM orders i
    GROUP BY i.cust) sub
WHERE sub.cust = o.cust
      AND o.amount = sub.ma
1
2
3
4
SELECT name
FROM (Select max(amount) OVER (PARTITION BY cust) AS ma, amount, name
      FROM orders)
WHERE ma = amount

Enhanced Subquery Optimizations in Oracle. Srikanth Bellamkonda, Rafi Ahmed, Andrew Witkowski, Angela Amor, Mohamed Zait, Chun-Chieh Lin. Proceedings of the VLDB Endowment, vol. 2 (2), 1366–1377, 2009.

Lecture <2025-11-24 Mon>

Lecture <2025-12-03 Wed>

Last updated on 25 Aug 2025
Published on 25 Aug 2025