Lecture Notes for CS480

Overview

Overview
Lecture <2024-01-09 Tue>
Lecture <2024-01-11 Thu>
- Crossproduct
- Relations
- Candidate Keys
Lecture <2024-01-20 Sat>
- Relational algebra
  - Selection example
Lecture <2024-01-23 Tue>
Lecture <2024-01-25 Thu>
- Project Use Case
  - Ideas
- relational algebra
Lecture <2024-01-30 Tue>
Lecture <2024-02-01 Thu>
Lecture <2024-02-06 Tue>
- Relational Algebra
Lecture <2024-02-08 Thu>
Lecture <2024-02-13 Tue>
Lecture <2024-02-15 Thu>
Lecture <2024-02-20 Tue>
- Cursor
Lecture <2024-02-22 Thu>
- Music collection - music player
Lecture <2024-02-27 Tue>
Lecture <2024-02-29 Thu>
Lecture <2024-03-05 Tue>
- Relational algebra to SQL
- Testing FDs
Lecture <2024-03-14 Thu>
Lecture <2024-03-26 Tue>
Lecture <2024-03-28 Thu>
Lecture <2024-04-04 Thu>
- Postgres page layout
Lecture <2024-04-11 Thu>
Lecture <2024-04-18 Thu>
- Join algorithms

Lecture <2024-01-09 Tue>

https://www.cs.uic.edu/~bglavic/cs480/2024-spring/homework/

Lecture <2024-01-11 Thu>

Crossproduct

$S_1 = \{e_1, e_2 \}$ $S_2 = \{a_1, a_2\}$

$S_1 \times S_2$

S1	S2
e1	a1
e1	a2
e2	a1
e2	a2

Relations

$\mathbb{D}_1 = \{ 0,1 \}$ $\mathbb{D}_2 = \{ red, green, blue \}$

$\mathbf{Shirts}(color: \mathbb{D}_2, is-muscle: \mathbb{D}_1)$

$\mathbb{D}_2 \times \mathbb{D}_1 = \{ (red, 0), (green, 0), (blue, 1), \dlots \}$

color	is-muscle
green	1
red	0
blue	1
blue	0

Candidate Keys

Lastname + Firstname is unique
Lastname or Firstname by itself is not

Lastname	Firstname	Age

ID is unique
Lastname + Firstname is unique
Lastname or Firstname by itself is not

ID	Lastname	Firstname	Age

Lecture <2024-01-20 Sat>

Relational algebra

Selection example

$\sigma_{city = 'Chicago'}(Add)$

table Add

state	city	zip
IL	Chicago	66014
IL	Schaumberg	66077
CA	San Francisco	11555

result

state	city	zip
IL	Chicago	66014

Lecture <2024-01-23 Tue>

Lecture <2024-01-25 Thu>

Project Use Case

Ideas

library (books) - 9 (14)
sports statistics - 9 (9)
public transportation - 7
inventory management - 2
game ratings / store - 7
online retailer - 7
crime statistics - 2
hospital management - 8
stock market - 5

relational algebra

1

SELECT * FROM student;

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
98988	Tanaka	Biology	120

$\gamma_{dept\_name; count(id)}(student)$

$\gamma_{count(id)}(\sigma_{dept\_name=CS}(student))$ $\sigma_{tot\_cred > 100}(student)$

$\sigma_{dept\_name = History \lor dept\_name = Finance}(student)$

$\sigma_{dept\_name = History \land tot\_cred < 20}(student)$

$\pi_{tot\_cred / 10, name}(student)$

$\pi_{name}(\sigma_{dept\_name = History}(student))$

\[ \pi_{name, course\_id}(student \bowtie_{id = id'} \rho_{id' \gets id}(takes)) \]

1

SELECT * FROM takes LIMIT 3;

id	course_id	sec_id	semester	year	grade
00128	CS-101	1	Fall	2009	A
00128	CS-347	1	Fall	2009	A-
12345	CS-101	1	Fall	2009	C

1

SELECT * FROM course LIMIT 3;

course_id	title	dept_name	credits
BIO-101	Intro. to Biology	Biology	4
BIO-301	Genetics	Biology	4
BIO-399	Computational Biology	Biology	3

1

SELECT * FROM instructor LIMIT 3;

id	name	dept_name	salary
10101	Srinivasan	Comp. Sci.	65000.00
12121	Wu	Finance	90000.00
15151	Mozart	Music	40000.00

$\pi_{name}(student) \cup \pi_{name}(instructor)$

$\pi_{name}(instructor) - \pi_{name}(student)$

$\pi_{name}(instructor) \cap \pi_{name}(student)$

Lecture <2024-01-30 Tue>

1

SELECT * FROM time_slot;

time_slot_id	day	start_hr	start_min	end_hr	end_min
A	M	8	0	8	50
A	W	8	0	8	50
A	F	8	0	8	50
B	M	9	0	9	50
B	W	9	0	9	50
B	F	9	0	9	50
C	M	11	0	11	50
C	W	11	0	11	50
C	F	11	0	11	50
D	M	13	0	13	50
D	W	13	0	13	50
D	F	13	0	13	50
E	T	10	30	11	45
E	R	10	30	11	45
F	T	14	30	15	45
F	R	14	30	15	45
G	M	16	0	16	50
G	W	16	0	16	50
G	F	16	0	16	50
H	W	10	0	12	30

1
2
3
4


SELECT count(*),
       (((end_hr * 60) + end_min) - (start_hr * 60 + start_min)) AS length
  FROM time_slot
 GROUP BY (((end_hr * 60) + end_min) - (start_hr * 60 + start_min));

count	length
4	75
15	50
1	150

return all student

1
2


SELECT *
FROM student s

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
45678	Levy	Physics	46
54321	Williams	Comp. Sci.	54
55739	Sanchez	Music	38
70557	Snow	Physics	0
76543	Brown	Comp. Sci.	58
76653	Aoi	Elec. Eng.	60
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

give all info for student with id 12345

1
2
3


SELECT *
FROM student
WHERE id = '12345';

id	name	dept_name	tot_cred
12345	Shankar	Comp. Sci.	32

names of students with tot_cred smaller than 10

1
2
3


SELECT name
FROM student
WHERE tot_cred < 10;

name
Snow

number of students that have taken more than 50 credits

1
2
3


SELECT count(*)
FROM student
WHERE tot_cred > 50;

count
9

for each credit amount, the number of students with this many credits (per 10 credits)

1
2
3


SELECT (tot_cred / 10)::int * 10, count(*)
FROM student
GROUP BY (tot_cred / 10)::int;

?column?	count
110	1
30	1
50	2
40	1
0	1
100	2
60	3
120	1
80	1

the smallest, average, and maximum number of credits over all student

1
2


SELECT min(tot_cred), avg(tot_cred), max(tot_cred)
FROM student;

min	avg	max
0	65.6923076923076923	120

the smallest, average, and maximum number of credits over all student per dept

1
2
3


SELECT dept_name, min(tot_cred), avg(tot_cred), max(tot_cred)
FROM student
GROUP BY dept_name;

dept_name	min	avg	max
Finance	110	110.0000000000000000	110
History	80	80.0000000000000000	80
Physics	0	34.0000000000000000	56
Music	38	38.0000000000000000	38
Comp. Sci.	32	61.5000000000000000	102
Biology	120	120.0000000000000000	120
Elec. Eng.	60	79.0000000000000000	98

1
2
3
4
5


SELECT dept_name,
       max(tot_cred) - min(tot_cred) AS diff
FROM student
GROUP BY dept_name
ORDER BY diff ASC;

dept_name	diff
Music	0
History	0
Finance	0
Biology	0
Elec. Eng.	38
Physics	56
Comp. Sci.	70

1
2


SELECT cmax, *
FROM student;

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
45678	Levy	Physics	46
54321	Williams	Comp. Sci.	54
55739	Sanchez	Music	38
70557	Snow	Physics	0
76543	Brown	Comp. Sci.	58
76653	Aoi	Elec. Eng.	60
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

1
2
3


SELECT tot_cred AS a
FROM student
ORDER BY a - a;

a
102
32
80
110
56
46
54
38
0
58
60
98
120

1
2
3


SELECT name
FROM student
WHERE tot_cred= (SELECT min(tot_cred) FROM student);

name
Snow

Lecture <2024-02-01 Thu>

1
2
3
4


SELECT DISTINCT s.*
FROM student s,
     takes t
WHERE s.id = t.id;

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120
98765	Bourikas	Elec. Eng.	98
55739	Sanchez	Music	38
45678	Levy	Physics	46
19991	Brandt	History	80
76653	Aoi	Elec. Eng.	60
00128	Zhang	Comp. Sci.	102
76543	Brown	Comp. Sci.	58
54321	Williams	Comp. Sci.	54
12345	Shankar	Comp. Sci.	32
23121	Chavez	Finance	110
44553	Peltier	Physics	56

1
2
3


SELECT *
FROM student s
WHERE NOT EXISTS (SELECT * FROM takes t WHERE s.id = t.id);

id	name	dept_name	tot_cred
70557	Snow	Physics	0

1
2
3


(SELECT * FROM student)
EXCEPT
(SELECT s.* FROM student s, takes t WHERE s.id = t.id)

id	name	dept_name	tot_cred
70557	Snow	Physics	0

1
2
3
4
5


SELECT i.*, ns.numst
FROM instructor i,
     (SELECT count(*) numst, dept_name FROM department
     GROUP BY dept_name) AS ns
WHERE i.dept_name = ns.dept_name;

id	name	dept_name	salary	numst
10101	Srinivasan	Comp. Sci.	65000.00	1
12121	Wu	Finance	90000.00	1
15151	Mozart	Music	40000.00	1
22222	Einstein	Physics	95000.00	1
32343	El Said	History	60000.00	1
33456	Gold	Physics	87000.00	1
45565	Katz	Comp. Sci.	75000.00	1
58583	Califieri	History	62000.00	1
76543	Singh	Finance	80000.00	1
76766	Crick	Biology	72000.00	1
83821	Brandt	Comp. Sci.	92000.00	1
98345	Kim	Elec. Eng.	80000.00	1

1
2
3


(SELECT * FROM instructor)
EXCEPT
(SELECT i.* FROM instructor i, student s WHERE i.dept_name = s.dept_name);

id

name

dept_name

salary

Lecture <2024-02-06 Tue>

Relational Algebra

orders

custid	item	quantity	price	shippingfee
1	Toothpaste	100	5.99	30
1	Shampoo	1	12.99	0
2	Shampoo	3	12.99	3

per customer compute to the total money they owe us
- for each order the quantity times price plus the shipping fee ($\pi$)
- then summed up across all orders ($\gamma$) - group by custid
show only customers where the total is smaller 500 ($\sigma$)

$orderprices \leftarrow \rho_{custid, orderprice}(\pi_{custid, quantity * price + shippingfee}(orders))$ $orderprices \leftarrow \rho_{orderprice \leftarrow quantity * price + shippingfee}(\pi_{custid, quantity * price + shippingfee}(orders))$

custid	orderprice
1	629.00
1	12.99
2	38.97

$totalcustorders \leftarrow \rho_{totalorders \leftarrow sum(orderprice)}(\gamma_{custid; sum(orderprice)}(orderprices))$

custid	totalorders
1	641.99
2	38.97

$\sigma_{totalorders < 500}(totalcustorders)$

custid	totalorders
2	38.97

get largest and average totalorders (maximum totalorders across all operators) ($\gamma$)

$maxttl \leftarrow \rho_{maxtotal,avgtotal}(\gamma_{max(totalorders),avg(totalorders)}(totalcustorders))$

maxtotal	avgtotal
641.99	340.48

$\pi_{maxtotal,avgtotal,custid}(\sigma_{maxtotal = totalorders}(maxttl \times totalcustorders))$

$\pi_{maxtotal,avgtotal,custid}(maxttl \bowtie_{maxtotal = totalorders} totalcustorders)$

maxtotal	avgtotal	custid
641.99	340.48	1

get customer with maximum totalorders
equivalences

$R \bowtie (S \bowtie T) \equiv (R \bowtie S) \bowtie T$

Lecture <2024-02-08 Thu>

1
2
3


CREATE TABLE edge (f TEXT, t TEXT);

INSERT INTO edge VALUES ('a','b'), ('b','c'), ('b','d'), ('d','e');

CREATE TABLE
INSERT 0 4

1

SELECT * FROM edge;

f	t
a	b
b	c
b	d
d	e

paths of length 2

1
2
3


SELECT e1.f, e2.t
FROM edge e1, edge e2
WHERE e1.f != e2.f AND e1.t != e2.t AND e1.t = e2.f;

f	t
a	d
a	c
b	e

paths of length up to 2

1
2
3
4
5
6


WITH twopaths AS (SELECT e1.f, e2.t
                FROM edge e1, edge e2
                WHERE e1.f != e2.f AND e1.t != e2.t AND e1.t = e2.f)
SELECT * FROM edge
UNION
(SELECT * FROM twopaths);

f	t
a	b
b	c
a	d
a	c
d	e
b	d
b	e

paths of length up to 3

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


WITH twopaths AS (SELECT e1.f, e2.t
                FROM edge e1, edge e2
                WHERE e1.f != e2.f AND e1.t != e2.t AND e1.t = e2.f),
  threepaths AS (
    (SELECT e1.f, e2.t
                FROM twopaths e1, edge e2
                WHERE e1.f != e2.f AND e1.t != e2.t AND e1.t = e2.f)
    )
SELECT * FROM edge
UNION
(SELECT * FROM twopaths)
UNION
(SELECT * FROM threepaths);

f	t
b	c
b	e
d	e
a	e
a	c
a	b
a	d
b	d

1
2
3
4
5
6
7


WITH RECURSIVE inpre AS (
  (SELECT *, 1 AS cnt FROM prereq) -- direct prereqs
  UNION
  (SELECT i.course_id, p.prereq_id, cnt + 1 AS cnt -- recursive step
     FROM inpre i, -- reference to previous iteration result
          prereq p WHERE i.prereq_id = p.course_id))
SELECT * FROM inpre;

course_id	prereq_id	cnt
BIO-301	BIO-101	1
BIO-399	BIO-101	1
CS-190	CS-101	1
CS-315	CS-101	1
CS-319	CS-101	1
CS-347	CS-101	1
EE-181	PHY-101	1

1
2
3
4
5
6
7
8


WITH RECURSIVE tc AS (
  SELECT *, 1 cnt FROM edge
  UNION
  SELECT e1.f, e2.t, cnt + 1 AS cnt
    FROM tc e1, edge e2
   WHERE e1.f != e2.f AND e1.t != e2.t AND e1.t = e2.f
         AND cnt <= 1)
SELECT * FROM tc;

f	t	cnt
a	b	1
b	c	1
b	d	1
d	e	1
a	c	2
a	d	2
b	e	2

1

INSERT INTO edge VALUES ('a', 'a');

INSERT 0 1

1
2
3
4
5
6
7


WITH RECURSIVE tc AS (
  SELECT *, 1 cnt FROM edge
  UNION
  SELECT e1.f, e2.t, cnt + 1 AS cnt
    FROM tc e1, edge e2
   WHERE e1.t = e2.f AND cnt < 5)
SELECT * FROM tc;

f	t	cnt
a	b	1
b	c	1
b	d	1
d	e	1
a	a	1
a	b	2
a	a	2
a	c	2
a	d	2
b	e	2
a	b	3
a	a	3
a	c	3
a	d	3
a	e	3
a	b	4
a	a	4
a	c	4
a	d	4
a	e	4
a	b	5
a	a	5
a	c	5
a	d	5
a	e	5

Lecture <2024-02-13 Tue>

1

SELECT schemaname, count(*) FROM pg_tables GROUP BY schemaname;

schemaname	count
public	21
pg_catalog	64
information_schema	4

Lecture <2024-02-15 Thu>

all student of the biology department

1
2
3


SELECT *
FROM student
WHERE dept_name = 'Biology';

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120

all student of the biology department with at least 20 credits

1
2
3


SELECT *
FROM student
WHERE dept_name = 'Biology' AND tot_cred >= 20;

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120

count student of the biology department with at least 20 credits

1
2
3


SELECT count(*)
FROM student
WHERE dept_name = 'Biology' AND tot_cred >= 20;

count
1

find student with the least credits (return credits)

1
2


SELECT max(tot_cred) AS maxttl
FROM student

maxttl
120

find all student information with the least credits (return credits)

1
2
3
4
5


WITH maxc AS (SELECT max(tot_cred) AS maxttl
FROM student)
SELECT *
FROM student s, maxc m
WHERE s.tot_cred = m.maxttl;

id	name	dept_name	tot_cred	maxttl
98988	Tanaka	Biology	120	120

1
2
3
4
5


WITH maxc AS (SELECT max(tot_cred) AS maxttl
FROM student)
SELECT *
FROM student s
WHERE s.tot_cred = (SELECT * FROM maxc);

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120

1
2
3
4


SELECT *
FROM student s
WHERE s.tot_cred = (SELECT max(tot_cred) AS maxttl
FROM student);

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120

every student that has the max credits in their department

1
2
3
4
5
6
7


WITH perdmax AS (SELECT max(tot_cred) AS mc,
                     dept_name
                FROM student
               GROUP BY dept_name)
SELECT s.*
  FROM student s, perdmax m
WHERE s.tot_cred = m.mc AND s.dept_name = m.dept_name;

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
55739	Sanchez	Music	38
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

1
2
3


SELECT *
  FROM student s1
WHERE tot_cred = (SELECT max(tot_cred) FROM student s2 WHERE s1.dept_name = s2.dept_name);

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
55739	Sanchez	Music	38
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

1
2
3
4


EXPLAIN ANALYZE
SELECT *
  FROM student s1
WHERE tot_cred = (SELECT max(tot_cred) FROM student s2 WHERE s1.dept_name = s2.dept_name);

QUERY PLAN
Seq Scan on student s1 (cost=0.00..7053.62 rows=2 width=152) (actual time=0.062..0.142 rows=7 loops=1)
Filter: (tot_cred = (SubPlan 1))
Rows Removed by Filter: 6
SubPlan 1
-> Aggregate (cost=15.63..15.64 rows=1 width=32) (actual time=0.007..0.007 rows=1 loops=13)
-> Seq Scan on student s2 (cost=0.00..15.62 rows=2 width=12) (actual time=0.003..0.005 rows=3 loops=13)
Filter: ((s1.dept_name)::text = (dept_name)::text)
Rows Removed by Filter: 10
Planning Time: 1.681 ms
Execution Time: 0.255 ms

1
2
3
4


EXPLAIN ANALYZE
SELECT *
  FROM student s1
WHERE (tot_cred,dept_name) IN (SELECT max(tot_cred), dept_name FROM student s2 GROUP BY dept_name);

QUERY PLAN
Hash Join (cost=21.75..38.61 rows=2 width=152) (actual time=0.069..0.075 rows=7 loops=1)
Hash Cond: ((s1.tot_cred = (max(s2.tot_cred))) AND ((s1.dept_name)::text = (s2.dept_name)::text))
-> Seq Scan on student s1 (cost=0.00..14.50 rows=450 width=152) (actual time=0.031..0.032 rows=13 loops=1)
-> Hash (cost=18.75..18.75 rows=200 width=90) (actual time=0.023..0.024 rows=7 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> HashAggregate (cost=16.75..18.75 rows=200 width=90) (actual time=0.017..0.019 rows=7 loops=1)
Group Key: s2.dept_name
Batches: 1 Memory Usage: 40kB
-> Seq Scan on student s2 (cost=0.00..14.50 rows=450 width=70) (actual time=0.001..0.002 rows=13 loops=1)
Planning Time: 1.001 ms
Execution Time: 0.140 ms

get for each students the buildings and room number they are taking classes sorted by student

1
2
3
4
5
6
7
8


SELECT DISTINCT s.id, building, room_number
FROM student s, section se, takes t
WHERE s.ID = t.ID
  AND t.course_id = se.course_id
  AND t.sec_id = se.sec_id
  AND t.semester = se.semester
  AND t.year = se.year
ORDER BY s.id;

id	building	room_number
00128	Packard	101
00128	Taylor	3128
12345	Packard	101
12345	Taylor	3128
12345	Watson	120
19991	Painter	514
23121	Packard	101
44553	Watson	100
45678	Packard	101
45678	Watson	100
54321	Packard	101
54321	Taylor	3128
55739	Packard	101
76543	Packard	101
76543	Taylor	3128
76653	Taylor	3128
98765	Packard	101
98765	Watson	120
98988	Painter	514

courses (get all information) without prerequisites

1
2
3
4
5
6


SELECT *
FROM course c
EXCEPT
SELECT c.*
FROM course c, prereq p
WHERE c.course_id = p.course_id;

course_id	title	dept_name	credits
MU-199	Music Video Production	Music	3
BIO-101	Intro. to Biology	Biology	4
CS-101	Intro. to Computer Science	Comp. Sci.	4
FIN-201	Investment Banking	Finance	3
PHY-101	Physical Principles	Physics	4
HIS-351	World History	History	3

1
2
3
4
5
6


SELECT *
  FROM course c
       LEFT OUTER JOIN
       (SELECT 1 AS indicator, * FROM prereq) p
           ON (c.course_id = p.course_id)
WHERE p.indicator IS NULL;

course_id	title	dept_name	credits
MU-199	Music Video Production	Music	3
HIS-351	World History	History	3
FIN-201	Investment Banking	Finance	3
PHY-101	Physical Principles	Physics	4
BIO-101	Intro. to Biology	Biology	4
CS-101	Intro. to Computer Science	Comp. Sci.	4

1
2
3


SELECT *
  FROM course c
WHERE NOT EXISTS (SELECT * FROM prereq p WHERE p.course_id = c.course_id);

course_id	title	dept_name	credits
MU-199	Music Video Production	Music	3
HIS-351	World History	History	3
FIN-201	Investment Banking	Finance	3
PHY-101	Physical Principles	Physics	4
BIO-101	Intro. to Biology	Biology	4
CS-101	Intro. to Computer Science	Comp. Sci.	4

1
2
3


SELECT *
  FROM course c
WHERE NOT c.course_id IN (SELECT p.course_id FROM prereq p);

course_id	title	dept_name	credits
BIO-101	Intro. to Biology	Biology	4
CS-101	Intro. to Computer Science	Comp. Sci.	4
FIN-201	Investment Banking	Finance	3
HIS-351	World History	History	3
MU-199	Music Video Production	Music	3
PHY-101	Physical Principles	Physics	4

1
2
3


SELECT *
  FROM course c
WHERE (SELECT count(*) FROM prereq p WHERE p.course_id = c.course_id) = 0;

course_id	title	dept_name	credits
BIO-101	Intro. to Biology	Biology	4
CS-101	Intro. to Computer Science	Comp. Sci.	4
FIN-201	Investment Banking	Finance	3
HIS-351	World History	History	3
MU-199	Music Video Production	Music	3
PHY-101	Physical Principles	Physics	4

Lecture <2024-02-20 Tue>

1
2
3
4
5


CREATE FUNCTION deptst(deptn TEXT) RETURNS SETOF student AS
$$
  SELECT * FROM student WHERE dept_name = deptn;
$$
LANGUAGE SQL;

CREATE FUNCTION

1

SELECT * FROM deptst('Biology');

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120

1
2
3
4
5


CREATE FUNCTION qtable(tablename TEXT) RETURNS SETOF student AS
$$
  SELECT * FROM tablename;
$$
LANGUAGE SQL;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


import psycopg2

# define connection parameters
connection = {
    'dbname': 'cs480_slides',
    'user': 'postgres',
    'host': '127.0.0.1',
    'password': 'test',
    'port': 5450
}

# open connection
conn = psycopg2.connect(**connection)
print(conn)

<connection object at 0x1099a8c10; dsn: 'user=postgres password=xxx dbname=cs480_slides host=127.0.0.1 port=5450', closed: 0>

None

Cursor

Cursors allow execution of SQL code

1
2


cur = conn.cursor()
print(cur)

<cursor object at 0x10ada3040; closed: 0>

1
2
3
4
5
6
7
8


cur = conn.cursor()

# run query
cur.execute("SELECT name, deptname FROM student")

# fetch results
rows = cur.fetchall()
print(rows[0])

('Zhang', 'Comp. Sci.')

1
2
3
4
5
6
7


# run query
cur.execute("SELECT count(*), deptname FROM student GROUP BY deptname")

# fetch results
rows = cur.fetchall()
for r in rows:
    print(f"department {r[1]} has {r[0]} students")

department Physics has 3 students
department Biology has 1 students
department Elec. Eng. has 2 students
department Finance has 1 students
department Comp. Sci. has 6 students
department History has 1 students
department Music has 1 students

1
2
3
4
5
6


# run query
cur.execute("SELECT name, deptname FROM student")

# fetch results
rows = cur.fetchall()
print(rows[0])

('Zhang', 'Comp. Sci.')

1
2


cur.close()
conn.close()

Lecture <2024-02-22 Thu>

Music collection - music player

albums: title, artist (one more), release date, genre (one or more),
tracks: title, tracks belong to albums and have a track number within this album, length, need artists, writing credits
artists: name, birthdate, wikipedia entry url
album collections contain one or more albums
playlists: have name, and contain songs (with order)
constraints:
- each album should have at least one tracks
- each track should have a name
- tracks should positive consecutive numbers

Lecture <2024-02-27 Tue>

Lecture <2024-02-29 Thu>

Lecture <2024-03-05 Tue>

Relational algebra to SQL

R(A,B,C) S(B,D,E)

\[ \sigma_{A=5}(\pi_{A,B}(R)) \bowtie \pi_{B,D}(S) \]

UNION and UNION ALL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


SELECT DISTINCT *
FROM (SELECT *
     FROM (
       SELECT A, B
       FROM R
       ) i1
     WHERE A = 5) l
     NATURAL JOIN
     (SELECT B, D
      FROM S) r

Testing FDs

test $A \to C$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


DROP TABLE R;
CREATE TABLE R (
  id TEXT,
  A INT,
  B INT,
  C INT
  );

INSERT INTO R
VALUES
   ('x1',1,1,1),
   ('x2',1,2,3),
   ('x3',1,3,4),
   ('x4',2,3,5);

DROP TABLE
CREATE TABLE
INSERT 0 4

1

SELECT * FROM r;

id	a	b	c
x1	1	1	1
x2	1	2	3
x3	1	3	4
x4	2	3	5

$\sigma_1: A \to C$

1
2
3
4


SELECT NOT EXISTS(
SELECT *
FROM R r1, R r2
WHERE r1.A = r2.A AND r1.C < r2.C);

?column?
f

$\sigma_2: A,B \to C,D$

1
2
3


SELECT *
FROM R r1, R r2
WHERE r1.A = r2.A AND r1.B=r2.B AND (r1.C != r2.C OR r1.D != r2.D)

$\sigma_3: A \to B,C$

1
2
3


SELECT NOT EXISTS (SELECT *
FROM R r1, R r2
WHERE r1.A = r2.A AND (r1.B != r2.B OR r1.C != r2.C));

?column?
f

Lecture <2024-03-14 Thu>

Lecture <2024-03-26 Tue>

Lecture <2024-03-28 Thu>

Lecture <2024-04-04 Thu>

Postgres page layout

https://www.postgresql.org/docs/current/storage-page-layout.html
use postgres module to inspect raw page storage: https://www.postgresql.org/docs/16/pageinspect.html

1
2
3


DROP TABLE IF EXISTS testpage;
CREATE TABLE testpage (a int, b int);
INSERT INTO testpage (SELECT a, round(random() * 100) AS b FROM generate_series(1,1000) g(a));

DROP TABLE
CREATE TABLE
INSERT 0 1000

1

SELECT count(*) FROM testpage;

count
1000

install the extension that allows for inspection

1

CREATE EXTENSION pageinspect;

get page header data for page 0

1

SELECT * FROM page_header(get_raw_page('testpage', 0));

lsn	checksum	flags	lower	upper	special	pagesize	version	prune_xid
0/1668F7A8	0	0	928	960	8192	8192	4	0

get tuples on the page 0

1

SELECT * FROM heap_page_items(get_raw_page('testpage', 0)) LIMIT 10;

lp	lp_off	lp_flags	lp_len	t_xmin	t_xmax	t_ctid	t_infomask2	t_infomask	t_hoff	t_data
1	8160	1	32	4461	4463	(4,97)	2	256	24	\x010000002f000000
2	8128	1	32	4461	4463	(4,98)	2	256	24	\x0200000006000000
3	8096	1	32	4461	0	(0,3)	2	2304	24	\x0300000007000000
4	8064	1	32	4461	0	(0,4)	2	2304	24	\x0400000024000000
5	8032	1	32	4461	0	(0,5)	2	2304	24	\x0500000063000000
6	8000	1	32	4461	0	(0,6)	2	2304	24	\x060000002b000000
7	7968	1	32	4461	0	(0,7)	2	2304	24	\x070000005e000000
8	7936	1	32	4461	0	(0,8)	2	2304	24	\x0800000041000000
9	7904	1	32	4461	0	(0,9)	2	2304	24	\x090000003c000000
10	7872	1	32	4461	0	(0,10)	2	2304	24	\x0a00000034000000

1

UPDATE testpage SET b = 10000 WHERE A < 3;

UPDATE 2

1

UPDATE testpage SET b = 10000 WHERE A BETWEEN 3 AND 5;

UPDATE 3

1

SELECT * FROM heap_page_items(get_raw_page('testpage', 0)) LIMIT 10;

lp	lp_off	lp_flags	lp_len	t_xmin	t_xmax	t_field3	t_ctid	t_infomask2	t_infomask	t_hoff	t_data
1	0	3	0
2	0	3	0
3	8160	1	32	4461	4464	0	(0,227)	16386	256	24	\x0300000007000000
4	8128	1	32	4461	4464	0	(0,228)	16386	256	24	\x0400000024000000
5	8096	1	32	4461	4464	0	(4,99)	2	256	24	\x0500000063000000
6	8064	1	32	4461	0	0	(0,6)	2	2304	24	\x060000002b000000
7	8032	1	32	4461	0	0	(0,7)	2	2304	24	\x070000005e000000
8	8000	1	32	4461	0	0	(0,8)	2	2304	24	\x0800000041000000
9	7968	1	32	4461	0	0	(0,9)	2	2304	24	\x090000003c000000
10	7936	1	32	4461	0	0	(0,10)	2	2304	24	\x0a00000034000000

Lecture <2024-04-11 Thu>

1
2
3
4
5
6


CREATE TABLE unc (a int, b int, c int, d int);

INSERT INTO unc (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

CREATE TABLE
INSERT 0 1000000

1
2


CREATE INDEX unc_idx ON unc (b);
CREATE INDEX clust_idx ON unc (a);

CREATE INDEX
CREATE INDEX

1

ANALYZE unc;

ANALYZE

1

EXPLAIN ANALYZE SELECT sum(d) FROM unc WHERE c < 1000;

QUERY PLAN
Finalize Aggregate (cost=11625.11..11625.12 rows=1 width=8) (actual time=15.214..16.839 rows=1 loops=1)
-> Gather (cost=11624.89..11625.10 rows=2 width=8) (actual time=15.139..16.835 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=10624.89..10624.90 rows=1 width=8) (actual time=11.086..11.086 rows=1 loops=3)
-> Parallel Seq Scan on unc (cost=0.00..10614.33 rows=4224 width=4) (actual time=0.007..10.962 rows=3339 loops=3)
Filter: (c < 1000)
Rows Removed by Filter: 329995
Planning Time: 0.503 ms
Execution Time: 16.879 ms

1

EXPLAIN ANALYZE SELECT sum(d) FROM unc WHERE a < 200000;

QUERY PLAN
Finalize Aggregate (cost=6990.86..6990.87 rows=1 width=8) (actual time=30.605..33.521 rows=1 loops=1)
-> Gather (cost=6990.64..6990.85 rows=2 width=8) (actual time=30.475..33.513 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=5990.64..5990.65 rows=1 width=8) (actual time=23.652..23.653 rows=1 loops=3)
-> Parallel Index Scan using clust_idx on unc (cost=0.42..5776.36 rows=85712 width=4) (actual time=0.053..18.336 rows=66666 loops=3)
Index Cond: (a < 200000)
Planning Time: 1.492 ms
Execution Time: 33.609 ms

1

SELECT count(*) FROM unc WHERE b < 1000;

count
99931

1

EXPLAIN ANALYZE SELECT sum(d) FROM unc WHERE b < 100;

QUERY PLAN
Aggregate (cost=2570.54..2570.55 rows=1 width=8) (actual time=7.501..7.503 rows=1 loops=1)
-> Bitmap Heap Scan on unc (cost=16.21..2568.03 rows=1004 width=4) (actual time=0.347..7.333 rows=919 loops=1)
Recheck Cond: (b < 100)
Heap Blocks: exact=846
-> Bitmap Index Scan on unc_idx (cost=0.00..15.96 rows=1004 width=0) (actual time=0.182..0.183 rows=919 loops=1)
Index Cond: (b < 100)
Planning Time: 2.479 ms
Execution Time: 7.604 ms

1

EXPLAIN ANALYZE SELECT sum(d) FROM unc WHERE b < 50000;

QUERY PLAN
Finalize Aggregate (cost=12131.02..12131.03 rows=1 width=8) (actual time=49.272..52.012 rows=1 loops=1)
-> Gather (cost=12130.80..12131.01 rows=2 width=8) (actual time=49.154..52.007 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=11130.80..11130.81 rows=1 width=8) (actual time=42.669..42.669 rows=1 loops=3)
-> Parallel Seq Scan on unc (cost=0.00..10614.33 rows=206588 width=4) (actual time=0.012..31.747 rows=166694 loops=3)
Filter: (b < 50000)
Rows Removed by Filter: 166639
Planning Time: 2.867 ms
Execution Time: 52.157 ms

1

EXPLAIN ANALYZE SELECT sum(d) FROM unc WHERE b < 10 AND a < 100000;

QUERY PLAN
Aggregate (cost=365.90..365.91 rows=1 width=8) (actual time=0.944..0.945 rows=1 loops=1)
-> Bitmap Heap Scan on unc (cost=5.18..365.88 rows=10 width=4) (actual time=0.041..0.938 rows=12 loops=1)
Recheck Cond: (b < 10)
Filter: (a < 100000)
Rows Removed by Filter: 82
Heap Blocks: exact=92
-> Bitmap Index Scan on unc_idx (cost=0.00..5.17 rows=100 width=0) (actual time=0.017..0.017 rows=94 loops=1)
Index Cond: (b < 10)
Planning Time: 1.716 ms
Execution Time: 1.019 ms

1

UPDATE unc SET a = a + 1, b = b + 1 WHERE MOD(a,5) = round(random() * 5);

UPDATE 174266

1

ANALYZE;

ANALYZE

1

EXPLAIN ANALYZE SELECT sum(d) FROM unc WHERE a < 200000;

QUERY PLAN
Finalize Aggregate (cost=13166.77..13166.78 rows=1 width=8) (actual time=31.870..34.994 rows=1 loops=1)
-> Gather (cost=13166.56..13166.77 rows=2 width=8) (actual time=31.764..34.989 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=12166.56..12166.57 rows=1 width=8) (actual time=26.906..26.906 rows=1 loops=3)
-> Parallel Seq Scan on unc (cost=0.00..11961.33 rows=82090 width=4) (actual time=0.012..22.929 rows=66666 loops=3)
Filter: (a < 200000)
Rows Removed by Filter: 266667
Planning Time: 1.070 ms
Execution Time: 35.060 ms

Lecture <2024-04-18 Thu>

Join algorithms

Hash join and nested loop (small table r)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


DROP TABLE IF EXISTS r;
CREATE TABLE r (a int, b int, c int, d int);

INSERT INTO r (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000) g(a))
;

DROP TABLE IF EXISTS s;
CREATE TABLE s (a int, b int, c int, d int);

INSERT INTO s (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

ANALYZE;

DROP TABLE
CREATE TABLE
INSERT 0 1000
DROP TABLE
CREATE TABLE
INSERT 0 1000000
ANALYZE

1
2
3
4


EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT r.*, s.*
FROM r, s
WHERE r.a = s.a;

QUERY PLAN
Gather (cost=1028.50..12267.84 rows=1000 width=32) (actual time=0.582..50.892 rows=1000 loops=1)
Output: r.a, r.b, r.c, r.d, s.a, s.b, s.c, s.d
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5616
-> Hash Join (cost=28.50..11167.84 rows=417 width=32) (actual time=27.469..43.645 rows=333 loops=3)
Output: r.a, r.b, r.c, r.d, s.a, s.b, s.c, s.d
Hash Cond: (s.a = r.a)
Buffers: shared hit=5616
Worker 0: actual time=41.046..41.047 rows=0 loops=1
Buffers: shared hit=1987
Worker 1: actual time=41.084..41.085 rows=0 loops=1
Buffers: shared hit=1926
-> Parallel Seq Scan on public.s (cost=0.00..9572.67 rows=416667 width=16) (actual time=0.013..19.361 rows=333333 loops=3)
Output: s.a, s.b, s.c, s.d
Buffers: shared hit=5406
Worker 0: actual time=0.014..18.685 rows=348615 loops=1
Buffers: shared hit=1885
Worker 1: actual time=0.015..18.182 rows=337440 loops=1
Buffers: shared hit=1824
-> Hash (cost=16.00..16.00 rows=1000 width=16) (actual time=0.328..0.328 rows=1000 loops=3)
Output: r.a, r.b, r.c, r.d
Buckets: 1024 Batches: 1 Memory Usage: 55kB
Buffers: shared hit=18
Worker 0: actual time=0.352..0.352 rows=1000 loops=1
Buffers: shared hit=6
Worker 1: actual time=0.381..0.381 rows=1000 loops=1
Buffers: shared hit=6
-> Seq Scan on public.r (cost=0.00..16.00 rows=1000 width=16) (actual time=0.118..0.201 rows=1000 loops=3)
Output: r.a, r.b, r.c, r.d
Buffers: shared hit=18
Worker 0: actual time=0.157..0.229 rows=1000 loops=1
Buffers: shared hit=6
Worker 1: actual time=0.187..0.262 rows=1000 loops=1
Buffers: shared hit=6
Planning:
Buffers: shared hit=141
Planning Time: 2.009 ms
Execution Time: 50.986 ms

1
2
3
4


EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT r.*, s.*
FROM r, s
WHERE r.a BETWEEN s.a - 10 AND s.a + 10;

QUERY PLAN
Nested Loop (cost=0.00..22515424.50 rows=111111111 width=32) (actual time=76.346..50025.108 rows=20945 loops=1)
Output: r.a, r.b, r.c, r.d, s.a, s.b, s.c, s.d
Join Filter: ((r.a >= (s.a - 10)) AND (r.a <= (s.a + 10)))
Rows Removed by Join Filter: 999979055
Buffers: shared hit=5412
-> Seq Scan on public.s (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.024..68.647 rows=1000000 loops=1)
Output: s.a, s.b, s.c, s.d
Buffers: shared hit=5406
-> Materialize (cost=0.00..21.00 rows=1000 width=16) (actual time=0.000..0.023 rows=1000 loops=1000000)
Output: r.a, r.b, r.c, r.d
Buffers: shared hit=6
-> Seq Scan on public.r (cost=0.00..16.00 rows=1000 width=16) (actual time=0.006..0.063 rows=1000 loops=1)
Output: r.a, r.b, r.c, r.d
Buffers: shared hit=6
Planning:
Buffers: shared hit=113
Planning Time: 0.750 ms
JIT:
Functions: 6
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.307 ms, Inlining 47.066 ms, Optimization 16.492 ms, Emission 12.743 ms, Total 76.609 ms
Execution Time: 50043.921 ms

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


DROP TABLE IF EXISTS r;
CREATE TABLE r (a int, b int, c int, d int);

INSERT INTO r (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

DROP TABLE IF EXISTS s;
CREATE TABLE s (a int, b int, c int, d int);

INSERT INTO s (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

ANALYZE;

DROP TABLE
CREATE TABLE
INSERT 0 1000000
DROP TABLE
CREATE TABLE
INSERT 0 1000000
ANALYZE

less memory

1
2
3
4


EXPLAIN (ANALYZE)
SELECT r1.*, s.*
FROM r r1, s
WHERE r1.b = s.b

QUERY PLAN
Merge Join (cost=230130.67..399330.10 rows=10947027 width=32) (actual time=290.547..1346.343 rows=9999389 loops=1)
Merge Cond: (r1.b = s.b)
-> Sort (cost=115063.84..117563.84 rows=1000000 width=16) (actual time=146.832..238.683 rows=1000000 loops=1)
Sort Key: r1.b
Sort Method: quicksort Memory: 63639kB
-> Seq Scan on r r1 (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.136..46.769 rows=1000000 loops=1)
-> Sort (cost=115063.84..117563.84 rows=1000000 width=16) (actual time=140.882..390.812 rows=9999395 loops=1)
Sort Key: s.b
Sort Method: quicksort Memory: 63639kB
-> Seq Scan on s (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.129..42.435 rows=1000000 loops=1)
Planning Time: 8.273 ms
JIT:
Functions: 7
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.176 ms, Inlining 0.000 ms, Optimization 0.194 ms, Emission 2.616 ms, Total 2.986 ms
Execution Time: 1543.345 ms

1
2


SHOW work_mem;
SHOW effective_cache_size;

work_mem
2MB
effective_cache_size
2MB

Merge join (small table r)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


DROP TABLE IF EXISTS r;
CREATE TABLE r (a int, b int, c int, d int);

INSERT INTO r (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

DROP TABLE IF EXISTS s;
CREATE TABLE s (a int, b int, c int, d int);

INSERT INTO s (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

ANALYZE;

DROP TABLE
CREATE TABLE
INSERT 0 1000000
DROP TABLE
CREATE TABLE
INSERT 0 1000000
ANALYZE

1
2
3
4


EXPLAIN (ANALYZE)
SELECT r.*, s.*
FROM r, s
WHERE r.a = s.a;

QUERY PLAN
Hash Join (cost=27906.00..57062.00 rows=1000000 width=32) (actual time=151.138..503.430 rows=1000000 loops=1)
Hash Cond: (r.a = s.a)
-> Seq Scan on r (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.014..35.329 rows=1000000 loops=1)
-> Hash (cost=15406.00..15406.00 rows=1000000 width=16) (actual time=149.873..149.874 rows=1000000 loops=1)
Buckets: 1048576 Batches: 1 Memory Usage: 55067kB
-> Seq Scan on s (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.015..38.646 rows=1000000 loops=1)
Planning Time: 7.518 ms
Execution Time: 521.277 ms

1
2
3
4


EXPLAIN (ANALYZE)
SELECT r.*, s.*
FROM r, s
WHERE r.b = s.b;

QUERY PLAN
Hash Join (cost=27906.00..168891.05 rows=10932905 width=32) (actual time=174.887..2121.716 rows=9997000 loops=1)
Hash Cond: (r.b = s.b)
-> Seq Scan on r (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.024..41.629 rows=1000000 loops=1)
-> Hash (cost=15406.00..15406.00 rows=1000000 width=16) (actual time=174.398..174.399 rows=1000000 loops=1)
Buckets: 1048576 Batches: 1 Memory Usage: 55067kB
-> Seq Scan on s (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.004..34.842 rows=1000000 loops=1)
Planning Time: 2.192 ms
JIT:
Functions: 10
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.214 ms, Inlining 0.000 ms, Optimization 0.663 ms, Emission 11.283 ms, Total 12.161 ms
Execution Time: 2420.994 ms

less memory

5MB for sorting and hashing and 10MB for buffer pool size

1
2
3
4


EXPLAIN (ANALYZE, BUFFERS)
SELECT r.*, s.*
FROM r, s
WHERE r.a = s.a;

QUERY PLAN
Gather (cost=1028.50..12267.84 rows=1000 width=32) (actual time=0.869..50.570 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5616
-> Hash Join (cost=28.50..11167.84 rows=417 width=32) (actual time=27.042..42.886 rows=333 loops=3)
Hash Cond: (s.a = r.a)
Buffers: shared hit=5616
-> Parallel Seq Scan on s (cost=0.00..9572.67 rows=416667 width=16) (actual time=0.012..19.056 rows=333333 loops=3)
Buffers: shared hit=5406
-> Hash (cost=16.00..16.00 rows=1000 width=16) (actual time=0.359..0.359 rows=1000 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 55kB
Buffers: shared hit=18
-> Seq Scan on r (cost=0.00..16.00 rows=1000 width=16) (actual time=0.107..0.213 rows=1000 loops=3)
Buffers: shared hit=18
Planning:
Buffers: shared hit=141
Planning Time: 1.244 ms
Execution Time: 50.697 ms

index join

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


DROP TABLE IF EXISTS r;
CREATE TABLE r (a int PRIMARY KEY, b int, c int, d int);

INSERT INTO r (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(900000,1900000) g(a))
;

DROP TABLE IF EXISTS s;
CREATE TABLE s (a int PRIMARY KEY, b int, c int, d int);

INSERT INTO s (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

ANALYZE;

DROP TABLE
CREATE TABLE
INSERT 0 1000001
DROP TABLE
CREATE TABLE
INSERT 0 1000000
ANALYZE

1
2
3
4


EXPLAIN
SELECT r1.*, s.*
FROM r r1, s
WHERE r1.a = s.a

QUERY PLAN
Merge Join (cost=30539.45..47292.07 rows=1000000 width=32)
Merge Cond: (r1.a = s.a)
-> Index Scan using r_pkey on r r1 (cost=0.42..31389.44 rows=1000001 width=16)
-> Index Scan using s_pkey on s (cost=0.42..31389.42 rows=1000000 width=16)

1
2
3
4


EXPLAIN
SELECT r1.a, s.a
FROM r r1, s
WHERE r1.a = s.a

QUERY PLAN
Merge Join (cost=25665.20..41340.04 rows=1000000 width=8)
Merge Cond: (r1.a = s.a)
-> Index Only Scan using r_pkey on r r1 (cost=0.42..25980.44 rows=1000001 width=4)
-> Index Only Scan using s_pkey on s (cost=0.42..25980.42 rows=1000000 width=4)

1
2


SELECT correlation FROM pg_stats
WHERE tablename = 'r' AND attname = 'a';

correlation
1

only one table clustered

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


DROP TABLE IF EXISTS r;
CREATE TABLE r (a int PRIMARY KEY, b int, c int, d int);

INSERT INTO r (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(900000,1900000) g(a)
ORDER BY random()
)
;

DROP TABLE IF EXISTS s;
CREATE TABLE s (a int PRIMARY KEY, b int, c int, d int);

INSERT INTO s (
SELECT a, round(random() * 100000) AS b, round(random() * 100000) AS c, round(random() * 100) AS d
FROM generate_series(1,1000000) g(a))
;

ANALYZE;

DROP TABLE
CREATE TABLE
INSERT 0 1000001
DROP TABLE
CREATE TABLE
INSERT 0 1000000
ANALYZE

1
2


SELECT correlation FROM pg_stats
WHERE tablename = 'r' AND attname = 'a';

correlation
0.012891567

1
2
3
4


EXPLAIN
SELECT r1.*, s.*
FROM r r1, s
WHERE r1.a = s.a

QUERY PLAN
Hash Join (cost=32789.00..65469.02 rows=1000000 width=32)
Hash Cond: (r1.a = s.a)
-> Seq Scan on r r1 (cost=0.00..15406.01 rows=1000001 width=16)
-> Hash (cost=15406.00..15406.00 rows=1000000 width=16)
-> Seq Scan on s (cost=0.00..15406.00 rows=1000000 width=16)

1
2
3
4


EXPLAIN
SELECT r1.*, s.*
FROM r r1, s
WHERE r1.a = s.a AND r1.B < s.b

QUERY PLAN
Hash Join (cost=32789.00..65469.01 rows=333333 width=32)
Hash Cond: (r1.a = s.a)
Join Filter: (r1.b < s.b)
-> Seq Scan on r r1 (cost=0.00..15406.01 rows=1000001 width=16)
-> Hash (cost=15406.00..15406.00 rows=1000000 width=16)
-> Seq Scan on s (cost=0.00..15406.00 rows=1000000 width=16)

Lecture <2024-04-23 Tue>

distributed databases (5)
NoSQL (3)
query compilation + main memory databases (vector-databases) 3)
statistics - cost estimate (2)
query optimization - join reordering (1)
disk-resident hash index (1)
MVCC
Aries (1)
light-weight indexes (zone maps) + partitioning

Lecture <2024-04-25 Thu>

relational algebra -> SQL transaction

$\Pi_{a_1, \ldots, a_n}(Q)$

1
2


SELECT a1, ..., an
FROM R

$\sigma_C(R)$

1
2
3


SELECT *
FROM R
WHERE C

$R \cup S$

1
2
3


SELECT * FROM R
UNION
SELECT * FROM S

$R \bowtie S$

1
2


SELECT *
FROM R NATURAL JOIN S

$\gamma_{f(a); G}(R)$

1
2
3


SELECT f(A)
FROM R
[ GROUP BY G ]

R(A,B,C), S(D,E,F), T(G,H)
$(\gamma_{sum(A), E}(R \bowtie_{C=D} S)) \bowtie_{E = G} T$

1
2
3
4
5
6
7
8


SELECT *
FROM (
 SELECT sum(A)
 FROM (SELECT *
       FROM R JOIN S ON (C = D)
) sub2
 GROUP BY E
) l JOIN  T ON (E = G);

ER to SQL

1
2
3
4
5


CREATE TABLE course (
courseid INT PRIMARY KEY,
title TEXT,
credits INT
);

time_slot_id	day	start_hr	start_min	end_hr	end_min
A	M	8	0	8	50
A	W	8	0	8	50
A	F	8	0	8	50
B	M	9	0	9	50
B	W	9	0	9	50
B	F	9	0	9	50
C	M	11	0	11	50
C	W	11	0	11	50
C	F	11	0	11	50
D	M	13	0	13	50
D	W	13	0	13	50
D	F	13	0	13	50
E	T	10	30	11	45
E	R	10	30	11	45
F	T	14	30	15	45
F	R	14	30	15	45
G	M	16	0	16	50
G	W	16	0	16	50
G	F	16	0	16	50
H	W	10	0	12	30

f	t	cnt
a	b	1
b	c	1
b	d	1
d	e	1
a	a	1
a	b	2
a	a	2
a	c	2
a	d	2
b	e	2
a	b	3
a	a	3
a	c	3
a	d	3
a	e	3
a	b	4
a	a	4
a	c	4
a	d	4
a	e	4
a	b	5
a	a	5
a	c	5
a	d	5
a	e	5

time_slot_id	day	start_hr	start_min	end_hr	end_min
A	M	8	0	8	50
A	W	8	0	8	50
A	F	8	0	8	50
B	M	9	0	9	50
B	W	9	0	9	50
B	F	9	0	9	50
C	M	11	0	11	50
C	W	11	0	11	50
C	F	11	0	11	50
D	M	13	0	13	50
D	W	13	0	13	50
D	F	13	0	13	50
E	T	10	30	11	45
E	R	10	30	11	45
F	T	14	30	15	45
F	R	14	30	15	45
G	M	16	0	16	50
G	W	16	0	16	50
G	F	16	0	16	50
H	W	10	0	12	30

f	t	cnt
a	b	1
b	c	1
b	d	1
d	e	1
a	a	1
a	b	2
a	a	2
a	c	2
a	d	2
b	e	2
a	b	3
a	a	3
a	c	3
a	d	3
a	e	3
a	b	4
a	a	4
a	c	4
a	d	4
a	e	4
a	b	5
a	a	5
a	c	5
a	d	5
a	e	5

Lecture Notes for CS480

Overview

Lecture <2024-01-09 Tue>

Lecture <2024-01-11 Thu>

Crossproduct

Relations

Candidate Keys

Lecture <2024-01-20 Sat>

Relational algebra

Selection example

Lecture <2024-01-23 Tue>

Lecture <2024-01-25 Thu>

Project Use Case

Ideas

relational algebra

Lecture <2024-01-30 Tue>

Lecture <2024-02-01 Thu>

Lecture <2024-02-06 Tue>

Relational Algebra

Lecture <2024-02-08 Thu>

Lecture <2024-02-13 Tue>

Lecture <2024-02-15 Thu>

Lecture <2024-02-20 Tue>

Cursor

Lecture <2024-02-22 Thu>

Music collection - music player

Lecture <2024-02-27 Tue>

Lecture <2024-02-29 Thu>

Lecture <2024-03-05 Tue>

Relational algebra to SQL

Testing FDs

Lecture <2024-03-14 Thu>

Lecture <2024-03-26 Tue>

Lecture <2024-03-28 Thu>

Lecture <2024-04-04 Thu>

Postgres page layout

Lecture <2024-04-11 Thu>

Lecture <2024-04-18 Thu>

Join algorithms

Hash join and nested loop (small table r)

less memory

Merge join (small table r)

less memory

index join

only one table clustered

Lecture <2024-04-23 Tue>

Lecture <2024-04-25 Thu>

relational algebra -> SQL transaction

ER to SQL

ER-diagram (create one from requirements)

time_slot_id	day	start_hr	start_min	end_hr	end_min
A	M	8	0	8	50
A	W	8	0	8	50
A	F	8	0	8	50
B	M	9	0	9	50
B	W	9	0	9	50
B	F	9	0	9	50
C	M	11	0	11	50
C	W	11	0	11	50
C	F	11	0	11	50
D	M	13	0	13	50
D	W	13	0	13	50
D	F	13	0	13	50
E	T	10	30	11	45
E	R	10	30	11	45
F	T	14	30	15	45
F	R	14	30	15	45
G	M	16	0	16	50
G	W	16	0	16	50
G	F	16	0	16	50
H	W	10	0	12	30

f	t	cnt
a	b	1
b	c	1
b	d	1
d	e	1
a	a	1
a	b	2
a	a	2
a	c	2
a	d	2
b	e	2
a	b	3
a	a	3
a	c	3
a	d	3
a	e	3
a	b	4
a	a	4
a	c	4
a	d	4
a	e	4
a	b	5
a	a	5
a	c	5
a	d	5
a	e	5