CS 505 - Computability and Complexity Theory

Welcome to the class! I’m excited to have you. Throughout this website, you’ll find all the relevant information needed for the course.

On this page, I’ll post important announcements, as well as a changelog. If you have any questions, please feel free to reach out to me.

Announcements

[May 8, 2025] All grades except for the Final Project write-up have been posted to Blackboard. You also should have received feedback in Blackboard for your project presentations (peer feedback and my own feedback). If you have any questions, contact me as soon as possible. Sample solutions for Homework 5 have been posted. Lecture 20 typed notes posted.
[April 28, 2025] Final Project write-ups are now required to be submitted via Blackboard. This has been noted on the Final Project page, as well as the PDF included there. Typed notes for Lecture 19 posted.
[April 26, 2025] Handwritten notes for Lecture 23 and Lecture 24 posted. Additional resources for the PCP theorem posted; see Resources.
[April 21, 2025] Homework 5 posted. Sample solutions for Homework 3 and Homework 4 posted. Make-up office hours will be held this week: April 23, 2025, at 2:00pm (in-person or via Zoom).
[April 11, 2025] Lecture 21 and Lecture 22 handwritten notes posted. Recorded lectures for Crypto and Complexity Theory, along with handwritten notes for those lectures, posted. Office Hours next week are cancelled due to travel on my end; I will still be available via email or Piazza.
[April 7, 2025] Lecture 19 and Lecture 20 handwritten notes posted. Schedule updated. Group orderings for the Final Project presentations posted. Please see the webpage for the schedule (i.e., when your group is presenting), along with how the ordering was decided.
[March 31, 2025] Homework 4 posted. Grades for Homework 3 posted on Gradescope. Schedule updated; no class the week of April 14, instead there will be recorded lectures.
[March 30, 2025] Typed notes for Lecture 17 and Lecture 18 posted.
[March 26, 2025] Handwritten notes for Lecture 17 and Lecture 18 posted. Typed notes for Lecture 16 posted.
[March 17, 2025] Typed notes for Lecture 15 posted.
[March 16, 2025] Handwritten notes for Lecture 15 and Lecture 16 posted. Typed notes for Lecture 14 posted. Midterm grades posted to Gradescope.
[March 11, 2025] Final Project information posted. Midterm sample solutions posted to Piazza. Schedule updated with Final Project information.
[March 3, 2025] Homework 3 posted. Homework 2 sample solutions posted.
[March 1, 2025] Typed notes for Lecture 13 posted.
[February 28, 2025] Lecture 11 and Lecture 12 typed notes posted. Handwritten notes for Lecture 13 and Lecture 14 posted.
[February 23, 2025] Lecture 9 and Lecture 10 typed notes posted. Handwritten notes for Lecture 11 and Lecture 12 posted.
[February 18, 2025] Schedule updated.
[February 17, 2025] Lecture 9 and Lecture 10 handwritten notes posted. Homework 1 grades released on Gradescope. Sample solutions for Homework 1 posted.
[February 13, 2025] Homework 2 posted.
[February 10, 2025] Lecture 7 and Lecture 8 handwritten notes posted.
[February 04, 2025] Lecture 5 and Lecture 6 posted.
[February 03, 2025] Handwritten notes for Lectures 2-6 posted (there are no notes for Lecture 1 since it was written on the board). Lecture 5 and Lecture 6 typed notes will be posted tonight or tomorrow. Zoom link added to office hours (see Important Info below and the Syllabus).
[January 31, 2025] Homework 1 updated to reflect schedule changes. Problem 5 has been changed. If you do not see this change, you may need to force refresh the course website in your browser, or open it in an incognito window. Schedule updated.
[January 27, 2025] Lecture 3 and Lecture 4 posted.
[January 21, 2025] Homework 1 posted. Homework collaboration policy posted.
[January 20, 2025] In-class lecture on January 21, 2025, is cancelled due to the weather. Class will be held on Zoom. Please check your email for the Zoom link.
[January 20, 2025] Since today is a holiday and the weather is very cold, in-person office hours are optional. A Zoom link will be sent for office hours today (check your email).
[January 19, 2025] Lecture 1 and Lecture 2 have been posted. Office hours have been updated (see above or the syllabus).

Important Info

Instructor: Alexander R. Block
Email: arblock [at] uic [dot] edu

Drop-in Office Hours

Time: Mondays, 2–3pm (or by appointment)
Location: SEO 1216 or Zoom
- Zoom link (UIC login required): https://uic.zoom.us/j/81305503904?pwd=w4N9MLSrL4M7sny3zQmYyWT6MQgkiG.1

Course Modality and Schedule: In-person only, BSB 289, 2:00 pm - 3:15 pm, Tuesday & Thursday.

Changelog

[January 20, 2025] Announcements moved to the top of this page. Links to lecture notes added to the schedule and resources.
[January 19, 2025] Added important info to this page. Announcement on this day posted.
[January 13, 2025] Syllabus updated.
[January 06, 2025] Website available.

Syllabus

Instructor and Course Details

Instructor: Alexander R. Block
Email: arblock [at] uic [dot] edu

Drop-in Office Hours

Time: Mondays, 2–3pm (or by appointment)
Location: SEO 1216 or Zoom
- Zoom link (UIC login required): https://uic.zoom.us/j/81305503904?pwd=w4N9MLSrL4M7sny3zQmYyWT6MQgkiG.1

Course Modality and Schedule: In-person only, BSB 289, 2:00 pm - 3:15 pm, Tuesday & Thursday.

Blackboard: https://uic.blackboard.com/ultra/courses/_279721_1/cl/outline

Gradescope: https://www.gradescope.com/courses/942742

Piazza: https://piazza.com/uic/spring2025/cs505

Course information will primarily be conveyed using this website (see here). Course discussion will happen on Piazza. All course assignments and grades will be collected and returned through Gradescope. I will also send email notifications to the class with announcements.

You are responsible for checking this website and emails for any and all updates and information regarding the course, including homework assignments and schedule changes. You are also responsible for keeping up to date on Piazza for any corrections and/or clarifications regarding assignments, or other important information.

Blackboard will be used sparingly in this course, primarily for Homework, Midterm, Final Project, and Final Grades. For all technical questions about Blackboard, email the Learning Technology Solutions team at LTS@uic.edu.

Communication Expectations

Students are responsible for all information instructors send to your UIC email. Faculty messages should be regularly monitored and read in a timely fashion.

Please use Piazza private messages shared with the instructors (not just the professor or TA by name) if you wish to communicate with us directly. Please only use email for something that explicitly should be kept private only to that person.

Please email me if you face an unexpected situation that may impede your attendance, participation in class and exam sessions, or timely completion of assignments.

Course Information

CS 505 is a graduate-level introductory course to Computability and Complexity Theory. You will be expected to read, understand, and write formal (i.e., mathematic) proofs.

Prerequisites: For UIC students, CS 305 is listed as a prerequisite. However, most (if not all) topics covered will be self-contained in this course. As stated above (but in another way), you will need mathematical maturity to succeed in this course. That is, you should be comfortable answering questions of the form “prove or disprove the following statements.” If you are able to do this, then this course is for you; if you struggle with these types of problem, then this course may not be for you.

Brief list of topics to be covered (subject to change)

Turing machines and their equivalent computational models
Languages and Decidability
The class $P$
The class $NP$ , and $NP$ -Completeness
Randomized Computations
Space Complexity
Interactive proofs; $IP = PSPACE$
(Time-dependent) Cryptography and Complexity Theory

Required and Recommended Course Material: No textbook is required for this course. Lectures will have all relevant information.

I will be closely following the book Computational Complexity: A Modern Approach by Aurora and Barak. I will additionally use material from Introduction to the Theory of Computation by Michael Sipser, Mathematics and Computation by Avi Widgerson, and Proofs, Arguments, and Zero-Knowledge by Justin Thaler.

I will give suggested additional readings for each lecture in relevant material freely available online.

Course Copyright: Please protect the copyright integrity of all course materials and content. Please do not upload course materials not created by you onto third-party websites or share content with anyone not enrolled in our course.

Disclaimer

The purpose of this syllabus is to give students guidance on what may be covered during the semester. I intend to follow the syllabus as closely as possible; however, I also reserve the right to modify, supplements, and/or make changes to the course as needs arise. All such changes will be communicated in advance through in-class annoucnements and in writing via this website and email.

Course Policies and Classroom Expectations

Grading Policies & Point Breakdown

Grades will be curved based on an aggregate course score and are not defined ahead of time. The score cut-offs for A, B, C, etc., will be set after the end of the course.

The course will have the following grade breakdown:

Task	% of total grade
Homework	40%
Midterm Exam	25%
Final Project	35%

Final Grade Assignments

My goal is to ensure that the assessment of your learning in this course is comprehensive, fair, and equitable. Your grade in the class will be based on the number of points you earn out of the total number of points possible, and is not based on your rank relative to other students. There are no set limits to the number of grades given (e.g., everyone can get an A if everyone does well). If the class average is at least 75%, then assigned letter grades will be based on a straight scale with the following thresholds:

Grade	Threshold
A	$\geq$ 90%
B	80–89.9%
C	70–79.9%
D	60–69.9%
F	$<$ 60%

If the class mean is less than 75%, then this scale will be adjusted to compensate.

Under no circumstances will grades be adjusted down (except in cases of course policy violation). You can use this straight grading scale as an indicator of your minimum grade in the course at any time during the course. You should keep track of your own points so that at any time during the semester you may calculate your minimum grade based on the total number of points possible at that particular time. If and when, for any reason, you have concerns about your grade in the course, please email me to schedule a time for you to speak with me so that we can discuss study techniques or alternative strategies to help you.

Regrade Policy

You are allowed to request one single regrade per homework assignment/non-final exam. Moreover, with every regrade request, you must submit the following information:

Which problems you are requesting a regrade for; and
The exact reason you are requesting a regrade.

I will be strict with this policy to ensure there are no frivolous regrade requests; i.e., do not request a regrade to try and argue for more points. You must have a specific and articulate reason for why you believe something was graded incorrectly. Finally, note that any regrade request can result in a score reduction if additional errors are discovered.

Homework Late Policy

All homework assignments are due by the beginning of class (2:00pm Central Time) on the day they are due. You may submit homework late, with a 25% point reduction per day of being late. On the fourth day of being late, you will receive zero points on the assignment, but it will still be graded in order to give you feedback.

In-class Participation

All course material will be primarily given through in-class lectures. However, as this is a graduate course, I will not require you to attend lecture, but it will be highly encouraged. You are responsible for submitting all assignments, taking the midterm exam, and completing the final project.

Note that though you are not required to attend lecture, I will not answer questions in office hours of the form “can you explain X?” when X was explained in class. By not attending lecture, you do not then get to ask me to teach you the lecture material in office hours.

Evaluation

Homework

There will be 4-5 homework assignments in this course, depending on our progress through the semester. Each homework assignment will be weighted equally when calculating your final grade. See here for more information about completing homework assignments.

Note that your lowest homework score will be automatically dropped from your final grade.

Midterm Exam

There will be an in-class midterm exam, covering topics from (roughly) the first half of the course. The tentative date for the midterm exam is Thursday, March 6, 2025, with an in-class review planned for the previous lecture on Tuesday, March 4, 2025. Please plan to attend class on this day for the midterm exam. I will notify all students once the midterm exam date is finalized, and will do so as soon as possible.

Final Project

In place of a final exam, there will instead be a final project. It will consist of both a written portion and an in-class presentation. Your grade will be assigned based on the written portion, the in-class presentation, and peer-evaluations as well. I will give more details about the final project as we get closer to the midpoint of the semester.

Academic Integrity

Consulting with your classmates on assignments is encouraged, except where noted. However, turn-ins are individual, and copying proofs from your classmates is considered plagiarism. You should never look at someone else’s writing, or show someone else your writing. Either of these actions are considered academic dishonesty (cheating) and will be prosecuted as such.

To avoid suspicion of plagiarism, you must specify your sources together with all turned-in materials. List classmates you discussed your homework with and webpages/resources from which you got inspiration and help. Plagiarism and cheating, as in copying the work of others, paying others to do your work, etc., is obviously prohibited, and will be reported (this includes asking questions and copying answers from forums such as Stack Overflow and Reddit).

I report all suspected academic integrity violations to the dean of students. If it is your first time, the dean of students may provide the option to informally resolve the case – this means the student agrees that my description of what happened is accurate, and the only repercussions on an institutional level are that it is noted that this happened in your internal, UIC files (i.e., the dean of students can see that this happened, but no professors or other people can, and it is not in your transcript). If this is not your first academic integrity violation in any of your classes, a formal hearing is held and the dean of students decides on the institutional consequences. After multiple instances of academic integrity violations, students may be suspended or expelled. For all cases, the student has the option to go through a formal hearing if they believe that they did not actually violate the academic integrity policy. If the dean of students agrees that they did not, then I revert their grade back to the original grade, and the matter is resolved.

If you are found responsible for violating the academic integrity policy, the penalty can range from receiving a zero on the assignment in question, receiving a grade deduction, or receiving an F in the class, depending on the severity of the violation.

As a student and member of the UIC community, you are expected to adhere to the Community Standards of academic integrity, accountability, and respect. Please review the UIC Student Disciplinary Policy for additional information.

GenAI

Since this course will be rigorous in formal (i.e., mathematical proofs), the use of GenAI is NOT allowed. As we will see, there are computational tasks that are impossible in any computational model, thus it is highly likely (and, indeed, expectd) that GenAI would answer any such questions incorrectly.

Failure to adhere to this policy will result in the following consequences:

First use: You will lose 50% of the points available on the assignment.
Second use: You will fail the assignment.
Third use: You will fail the course.

Accommodations

Disability Accommodation Procedures

UIC is committed to full inclusion and participation of people with disabilities in all aspects of university life. If you face or anticipate disability-related barriers while at UIC, please connect with the Disability Resource Center (DRC) at drc.uic.edu, via email at drc@uic.edu, or call (312) 413-2183 to create a plan for reasonable accommodations. To receive accommodations, you will need to disclose the disability to the DRC, complete an interactive registration process with the DRC, and provide me with a Letter of Accommodation (LOA). Upon receipt of an LOA, I will gladly work with you and the DRC to implement approved accommodations.

Religious Accommodations

Following campus policy, if you wish to observe religious holidays, you must notify me by the tenth day of the semester. If the religious holiday is observed on or before the tenth day of the semester, you must notify me at least five days before you will be absent. Please submit this form by email with the subject heading: “[CS 505] YOUR NAME: Requesting Religious Accommodation.”

Classroom Environment

Inclusive Community

UIC values diversity and inclusion. Regardless of age, disability, ethnicity, race, gender, gender identity, sexual orientation, socioeconomic status, geographic background, religion, political ideology, language, or culture, we expect all members of this class to contribute to a respectful, welcoming, and inclusive environment for every other member of our class. If aspects of this course result in barriers to your inclusion, engagement, accurate assessment, or achievement, please notify me as soon as possible.

Name and Pronoun Use

If your name does not match the name on my class roster, please let me know as soon as possible. My pronouns are [she/her; he/him; they/them]. I welcome your pronouns if you would like to share them with me. For more information about pronouns, see this page: https://www.mypronouns.org/what-and-why.

Community Agreement/Classroom Conduct Policy

Be present by removing yourself from distractions, whether they be phone notifications, entire devices, conversations, or anything else.
Be respectful of the learning space and community. For example, no side conversations or unnecessary disruptions.
Use preferred names and gender pronouns.
Assume goodwill in all interactions, even in disagreement.
Facilitate dialogue and value the free and safe exchange of ideas.
Try not to make assumptions, have an open mind, seek to understand, and not judge.
Approach discussion, challenges, and different perspectives as an opportunity to “think out loud,” learn something new, and understand the concepts or experiences that guide other people’s thinking.
Debate the concepts, not the person.
Be gracious and open to change when your ideas, arguments, or positions do not work or are proven wrong.
Be willing to work together and share helpful study strategies.
Be mindful of one another’s privacy, and do not invite outsiders into our classroom.

Furthermore, our class (in person and online) will follow the CS Code of Conduct. If you are not adhering to our course norms, a case of behavior misconduct will be submitted to the Dean of Students and to the Director of Undergraduate Studies in the department of Computer Science. If you are not adhering to our course norms, you will not get full credit for your work in this class. For extreme cases of violating the course norms, credit for the course will not be given.

Student Parents

I know well how exhausting balancing school, childcare, and work can be. I would like to help support you and accommodate your family’s needs, so please don’t keep me in the dark. I hope you will feel safe disclosing your student-parent status to me so that I can help you anticipate and solve problems in a way that makes you feel supported. Unforeseen disruptions in childcare often put parents in the position of having to choose between missing classes to stay home with a child or leaving them with a less desirable backup arrangement. While this is not meant to be a long-term childcare solution, occasionally bringing a child to class in order to cover gaps in care is perfectly acceptable. If your baby or young child comes to class with you, please plan to sit close to the door so that you can step outside without disrupting learning for other students if your child needs special attention. Non-parents in the class, please reserve seats near the door for your parenting classmates or others who may need to step out briefly.

Academic Success, Wellness, and Safety

We all need the help and the support of our UIC community. Please visit my drop-in hours for course consultation and other academic or research topics. For additional assistance, please contact your assigned college advisor and visit the support services available to all UIC students.

Academic Success

UIC Tutoring Resources
College of Engineering tutoring program
Equity and Inclusion in Engineering Program
UIC Library and UIC Library Research Guides.
Offices supporting the UIC Undergraduate Experience and Academic Programs.
Student Guide for Information Technology
First-at-LAS Academic Success Program, focusing on LAS first-generation students.

Wellness

Counseling Services : You may seek free and confidential services from the Counseling Center at https://counseling.uic.edu/.
Access U&I Care Program for assistance with personal hardships.
Campus Advocacy Network : Under Title IX, you have the right to an education that is free from any form of gender-based violence or discrimination. To make a report, email TitleIX@uic.edu. For more information or confidential victim services and advocacy, visit UIC’s Campus Advocacy Network at http://can.uic.edu/.

Safety

UIC Safe App—PLEASE DOWNLOAD FOR YOUR SAFETY!
UIC Safety Tips and Resources
Night Ride
Emergency Communications: By dialing 5-5555 from a campus phone, you can summon the Police or Fire for any on-campus emergency. You may also set up the complete number, (312) 355-5555, on speed dial on your cell phone.

Schedule

This schedule is tentative and subject to change. Any changes will be announced.

Lecture Number (Date)	Topics	Announcements
Lecture 1 (Jan 14)	Syllabus Review: Math Notation and Big-Oh Notation Languages Introduction to Turing Machines
Lecture 2 (Jan 16)	Turing Machines and equivalent computational models Uncomputability Universal Turing Machines	Correction from in-class lecture; see Lecture 2.
Lecture 3 (Jan 21)	Uncomputability wrap-up Complexity Classes $P$ , $NP$ , and $EXP$	Homework 1 assigned.
Lecture 4 (Jan 23)	Reducibility $NP$ -Completeness Cook-Levin Theorem
Lecture 5 (Jan 28)	Cook-Levin Theorem wrap-up $NP$ -Reductions
Lecture 6 (Jan 30)	More $NP$ -Reductions Search vs. Decision Problems Complexity Classes $coNP$ and $NEXP$	Problem 5 of Homework 1 will be changed to reflect change in schedule.
Lecture 7 (Feb 4)	Diagonalization: Time Hierarchies
Lecture 8 (Feb 6)	Diagonalization: $NP$ -intermediate Problems Diagonalization: Oracle Machines	Homework 1 due.
Lecture 9 (Feb 11)	Space Complexity: $PSPACE$ , $L$ , and $NL$
Lecture 10 (Feb 13)	Space Complexity (Continued)	Homework 2 assigned.
Lecture 11 (Feb 18)	Space Complexity Wrap-up The Polynomial Hierarchy
Lecture 12 (Feb 20)	The Polynomial Hierarchy (continued) Alternations
Lecture 13 (Feb 25)	The Polynomial Hierarchy and Alternations (continued)
Lecture 14 (Feb 27)	Randomized Computations	Homework 2 due.
Midterm Review (Mar 4)	Covers Lectures 1-13 (Up to $PH$ and Alternation)	Homework 3 assigned.
Midterm Exam (Mar 6)	Covers Lectures 1-13
Lecture 15 (Mar 11)	Midterm Answers Randomized Computations (continued)	Final Project information posted.
Lecture 16 (Mar 13)	Randomized Computations (continued) Boolean Circuits
Lecture 17 (Mar 18)	Boolean Circuits (continued)
Lecture 18 (Mar 20)	Boolean Circuits (continued)	Homework 3 due.
Mar 22, 11:59pm CDT		Final Project proposals due.
NO CLASS; SPRING BREAK (Mar 25)
NO CLASS; SPRING BREAK (Mar 27)
Lecture 19 (Apr 1)	Interactive Proofs	Homework 4 assigned.
Lecture 20 (Apr 3)	Interactive Proofs (continued)
Lecture 21 (Apr 8)	Interactive Proofs (continued)
Lecture 22 (Apr 10)	Interactive Proofs (continued)
NO CLASS; Recorded Lecture (Apr 15)	Cryptography and Complexity Theory	Homework 4 due.
NO CLASS; Recorded Lecture (Apr 17)	Cryptography and Complexity Theory (continued)
Lecture 23 (Apr 22)	The PCP Theorem	Homework 5 assigned.
Lecture 24 (Apr 24)	The PCP Theorem (continued)
Final Project Presentations (Apr 29)
Final Project Presentations (May 1)
May 6, 2:00pm CDT		Homework 5 due.
May 10, 11:59pm CDT		Final Project written reports due.

Resources

This page will be updated throughout the semester with new resources as I find them. If you have found a particularly useful resource, feel free to let me know and I will gladly add them to the resources below.

Lecture Notes

Books

Computational Complexity: A Modern Approach (Draft) by Sanjeev Arora and Boaz Barak
Introduction to the Theory of Computation¹ by Michael Sipser
Proofs, Arguments, and Zero-knowledge by Justin Thaler
Mathematics and Computation by Avi Widgerson

Homework

This page will be updated with homework assignments as they become available throughout the semester. All homework assignments will be due before the start of class. That is, by no later than 2:00pm Central US Time.

All homework is required to be typeset in $L A T E X$ . Included with each assignment is a .pdf file of the assignment, along with a .zip folder with the $L A T E X$ source for you to use to complete the assignment.

All homework is required to be submitted through Gradescope. For submission, you will simply need to upload the .pdf file of your assignment.

You will likely not be able to answer all the questions in a given homework when it is released. My intention is to spread out the material you need to complete each homework over several lectures. This will allow you to either (a) do the homework problems incrementally as we cover the relevant material in class; (b) wait until all lectures are completed then do the homework all at once; or (c) finish the homework as soon as it’s assigned by reading ahead in other resources.

Disclaimer

The schedule below is tenative and is subject to change based on how we progress through the semester.

Collaboration Policy

Collaboration between sutends is encouraged. However, all collaborations need to be acknowledged (whether they are in this class, or outside of this class). You MUST list all collaborators for homework assignments. Moreover, collaborating does not mean you can copy-paste work from each other. Each submission needs to be in your own words, otherwise it will be considered plagiarism.

You are allowed to look to other resources for help with the homework, but you MUST properly cite these sources. Please use the \cite command and add your citations in the proper format to the included local.bib in your homework assignments.

Finally, please acknowledge any other discussions that helped you complete this assignment. This can include “office hours,” “Piazza,” or other discussions where direct collaboration did not happen.

Violating Collaboration Policy

Failing to adhere to the collaboration policy outlined above will result in various penalties.

First violation: You will lose 50% of the points available on the assignment.

Two or more violations: You will fail the assignment.

Homework 1

Date Assigned: January 21, 2025.
Due Date: February 6, 2025, no later than 2:00pm Central Time.
Updated: January 31, 2025 (see Announcements).
PDF file: homework-1.pdf (SHA256: bfcb1bb09bcb433c2e22b462bd1a9ea62d227767936db3099261afcfa3124a37)
Source files: homework-1.zip (SHA256: 73fedc2f3141d71314fe451f2732bf6a7c1dcbf14df8d14c0de05d24289d69d4)
Sample solutions: homework-1-solutions.pdf

Homework 2

Date Assigned: February 13, 2025.
Due Date: February 27, 2025, no later than 2:00pm Central Time.
PDF file: homework-2.pdf (SHA256: 38ade5b4a4b5abd0070c33953861e201253efbcd5fc32dc904abc1314a4a1e54)
Source files: homework-2.zip (SHA256: 5e55f481dcc61aa5489875fc1e00084b1b02081c98fb6657d74551c543276b8b)
Sample solutions: homework-2-solutions.pdf

Homework 3

Date Assigned: March 4, 2025.
Due Date: March 20, 2025, no later than 2:00pm Central Time.
PDF file: homework-3.pdf (SHA256: 276a2dea7f34c264b355968bc90c7052889816082c7c1c179566c9b25537d980)
Source files: homework-3.zip (SHA256: 6e9e6b22715ba055dd1a884bdc18078851629992db7a628051c5ccc41d263635)
Sample solutions: homework-3-solutions.pdf

Homework 4

Date Assigned: April 1, 2025
Due Date: April 15, 2025, no later than 2:00pm Central Time.
PDF file: homework-4.pdf (SHA256: c1d1204ea77d579a379c6d25efb81fffcfc769317e754046a9b2e49ab69a1f31)
Source files: homework-4.zip (SHA256: 75ad92909df75544821d88a3a529bf07d6c7748c31b0114c9ba2df1104bcf7ed)
Sample solutions: homework-4-solutions.pdf

Homework 5

Date Assigned: April 22, 2025
Due Date: May 6, 2025, no later than 2:00pm Central Time.
PDF file: homework-5.pdf (SHA256: 01a81895717cab5d20fe83c7506f7293a02586af0fa53084c2865c5c6d1cf40c)
Source files: homework-5.zip (SHA 256: 19aeb9c23d39e0827c41b72213d6f64851bfa3ee5644e3477a50800565e00b9e)
Sample solutions: homework-5-solutions.pdf

Final Project

PDF Version

Final Project Group Presentation Schedule

I have randomly generated the Final Project Presentation schedule, as outlined below. The methodology is as follows.

List all groups in alphabetical order by first name, with each group being sorted alphabetically by first name internally.
Randomly generate a permutation to shuffle this alphabetical ordering.
Output the shuffled ordering. First 3 groups present on Tuesday, April 29, 2025; last 2 groups present on Thursday, May 1, 2025.

Groups

Ali
Brian & Victoria
Cameron
Javed & Nathan
Mohsen

Random permutation sampled: [4, 1, 3, 5, 2].

Final Ordering

Tuesday, April 29, 2025

Javed & Nathan
Ali
Cameron

Thursday, May 1, 2025

Mohsen
Brian & Victoria

Sagemath Code used to generate this result.

groups = ["Ali", "Brian & Victoria", "Cameron", "Javed & Nathan", "Mohsen"]
P = Permutations(5) # set of all permutations on [1,2,3,4,5]
p = P.random_element() # samples a random permutation of the list [1,2,3,4,5]
p # prints the sampled permutatation
shuffled_groups = [groups[i-1] for i in p]
shuffled_groups # prints the new ordering for 'groups'

Code screenshot below.

Project Description

The project you choose can be related to your research area, or completely unrelated. You may work in teams of up to 3 people. However, whatever your project is and however many people you have on your team, the project will consist of two major components—an In-Class Presentation, and a Written Report—as well as two minor components—Project Proposals and Peer Evaluations. There are no rigid page limits for the report; anything between 4–20 pages can work. For the In-Class presentation, you are expected to give a 15–18 minute presentation using the presentation materials of your choice (e.g., PowerPoint, Keynote, $L A T E X$ , Google Slides, a Board Talk, etc.).

The project can be a survey of a problem or topic of your choice, or a novel analysis of a problem that you like. In the first case, if you read some papers and summarize them in a survey, give the reader the required background (which may be covered only briefly in some conference papers) together with the main results and their proofs and open questions. For the second scenario, if you try to solve a problem that you are interested in, explain the connections with previous work; in case you don’t arrive to a solution by the end of the term, show what approaches you tried and what didn’t work.

The goal of the project is to understand a problem as much as possible, and to give you experience with complexity theory research. Give the reader the background and necessary explanations to make the problem very clear, understand the contribution of the paper, and the approaches used. You do not have to summarize every theorem in the paper (in either write-up or presentation). Pick instead one or a couple of results in the paper(s) you are reading and focus on those, while trying to answer questions such as: What is the idea of the proof? What techniques are the authors using? Where are the difficulties? What are the remaining open questions?

Recent proceedings of good conferences that publish theoretical work are a possible starting point. Some examples are STOC/FOCS/ITCS/SODA/APPROX-RANDOM, CRYPTO/TCC, EC (Economics and Computation), COLT, PODC.

Project Timeline

Saturday, March 22, 2025, by 11:59pm CDT. Submit your project proposals to me. This includes at least one paragraph about your project, along with at least one (or, ideally, a few) papers you plan to read for the project. The easiest way to do this is to email me, cc your team members, and include the relevant information in the email.
Tuesday, April 29, 2025, and Thursday, May 1, 2025. The last week/last two classes, we will hold the in-class presentations. Attendance will be required except for special cases. Part of this project is to give you experience presenting material you may not be very familiar with or an expert in to an audience who will be even less familiar with your topic. Also, you will be giving peer evaluations to your fellow students as well, and everyone will be required to submit these peer evaluations.
Saturday, May 10, 2025, by 11:59pm CDT. Your written reports are due the day after finals. I want to give you as much time as possible for the written reports, but I will still need to read and grade them before grades are due. Early submissions (e.g., before finals) are also fine. You will submit your written reports via ~~Gradescope~~ Blackboard.

Grade Brakedown

(5%) Project Proposals
(10%) Peer Evaluations
(40%) In-Class Presentation
(45%) Written Report

Lecture 1

Review

Math Notation

Here are some common math notations we will be using throughout the semester. Any other notation introduced in lecture outside what is presented here will be explained. Please do not be afraid to ask questions about notation in class.

We let $Z : = {0, \pm 1, \pm 2, \dots,}$ denote the set of all integers and let $N : = {0, 1, 2, \dots,}$ denote the set of natural numbers (i.e., non-negative integers). The set $Z^{+} : = {1, 2, \dots,}$ denotes the set of all positive integers. We let $R$ denote the set of all real numbers, with $⌈ x ⌉$ and $⌊ x ⌋$ defined to be the ceiling and floor of $x \in R$ , respectively (i.e., round up or round down to the nearest integer). All logarithms are base 2 unless otherwise stated; i.e., $lo g (x) : = lo g_{2} (x)$ . When need, we let $ln (x) : = lo g_{e} (x)$ denote the natural logarithm.

Sets and Strings

Let $S$ be a finite set. We say that $s$ is a string with alphabet $S$ if it is a finite ordered tuple with elements in $S$ . That is, if $s$ has length $n$ , then $s \in S^{n}$ (i.e., it is a vector). We let $S^{*}$ denote the set of all finite length strings with alphabet $S$ (i.e., the set of all finite length vectors with elements in $S$ ). Given two strings $u$ and $v$ , we let $u ∥ v$ denote their concatenation; sometimes, we also use $u ⊙ v$ or $(u, v)$ to denote their concatenation as well. Finally, given a string/vector $s$ , we let $∣ s ∣$ denote the length of the string (i.e., number of elements in $s$ ). When needed, we let $∣ s ∣_{2}$ denote the bit-length of $s$ ; that is, the number of bits needed to represent $s$ . If $∣ s ∣ = n$ , then $∣ s ∣_{2} = n \cdot ⌈ lo g (∣ S ∣)⌉$ .

Languages

Much of this course will be concerned with the idea of a language. A language is simply a set $L \subseteq {0, 1}^{*}$ . A language can be finite or infinite. While the above definition doesn’t really mean much, we’ll see later in the course how we define languages in more meaningful ways.

Function Notation

Let $A$ and $B$ be arbitrary sets (not necessarily finite). Then we let $f : A \to B$ denote a function from $A$ to $B$ .

Big-Oh Notation

Let $f, g : N \to N$ be two functions. We way that $f$ is big-oh of $g$ if there exists $N \in N$ and constant $c > 0$ such that for all $n \geq N$ , we have $f (n) \leq c \cdot g (n)$ . We denote this as $f (n) = O (g (n))$ or $f (n) \in O (g (n))$ . Similarly, we say that $f$ is big-omega of $g$ if $g = O (f (n))$ ; we denote this as $f (n) = Ω (g (n))$ . We say that $f$ is theta of $g$ if $f (n) = O (g (n))$ and $f (n) = Ω (g (n))$ .

We say that $f$ is little-oh of $g$ if for all constants $c > 0$ , there exists $N \in N$ such that for all $n \geq N$ , it holds that $f (n) < c \cdot g (n)$ . We denote this as $f (n) = o (g (n))$ . Similarly, $f$ is litte-omeaga of $g$ if $g (n) = o (f (n))$ ; equivalently, for all constants $c > 0$ , there exists $N \in N$ such that for all $n \geq N$ , it holds that $f (n) > c \cdot g (n)$ . We denote this as $f (n) = ω (g (n))$ .

Turning Machines

The goal of complexity theory is to quantify and measure computational efficiency. How can we do this? It is first necessary to establish a concrete model of computation. But this seems impossible—surely, there are infinite computational models one can cook up to get the job done, right?

Fortunately, we can focus on a single model of computation: the Turning Machine. For (nearly)¹ every physically realizable system we have been able to come up with, the Turing machine can efficiently simulate this other model. This gives us a single model with which we can try to understand and quantify computational efficiency.

Turing Machines, Informally

Turing machines, introduced by Alan Turing in 1948, are an attepmt to formalize the idea of computation as people have understood it for centuries. Intuitively, when someone asks you to compute the answer to a problem, we as people seem to follow a basic formula for realizing this:

Get the problem, along with the inputs to the problem.
Get a piece of scratch paper to work on.
Apply a set of rules to the inputs and make decisions according to those rules.
Arrive at an answer to the problem, state the answer, and stop working.

As a concrete example, consider the problem of multiplying two integers. Take $x = 451$ and $y = 127$ and suppose we want to compute their product $z = x \cdot y$ . There is a simple (so-called “grade-school”) algorithm for computing $z$ , which we show in the figure below.

Grade-school multiplication

Turing machines attempt to formalize the above intuitive process, but in a very restricted capacity (i.e., Turing machines are incredibly simple and, in a word, stupid).

Turing Machines, Formally

A $k$ -tape Turing machine, which we denote by $M$ , is a machine with $k$ tapes that are infinitely long in one direction (i.e., they are represented by the set $N$ ). The machine $M$ has a single read-only input tape, a single read-write output tape, and $(k - 2)$ read-write work tapes. $M$ also contains a register which tracks the machine’s state. Each head can be moved independently. More formally, a $k$ -tape Turing machine $M$ is described by a tuple $(Σ, Γ, Q, δ)$ with the following properties:

$Σ$ is a finite set, which we call the input alphabet;
$Γ = Σ \cup {▹, □}$ is called the tape alphabet, and it contains $Σ$ along with two special symbols that are not in $Σ$ . $▹ \neq \in Σ$ is the start symbol and $□ \neq \in Σ$ is the blank symbol.
$Q$ is a finite set of states which can be held in the register. $Q$ always contains two special states: $q_{start}$ (the start state) and $q_{halt}$ (the halt state).
A transition function $δ : Q \times Γ^{k} \to Q \times Γ^{k - 1} \times {L, R, S}^{k}$ describing the rules $M$ follows during a computation.

We say that a tuple $(q, γ_{1}, \dots, γ_{k}) \in Q \times Γ^{k}$ is a configuration of the Turing machine $M$ . In this light, the transition function $δ$ maps configurations of the Turing machine to new configurations as follows:

$q$ represents the current state of the machine $M$ stored in its register;
$γ_{i} \in Γ$ represents the current symbol under tape $i$ ’s head, for $i \in [k]$ ;
$δ$ reads the current state and the contents of the $k$ tape heads. It then outputs a new configuration $(q^{'}, γ_{2}^{'}, \dots, γ_{k}^{'}, d_{1}, \dots, d_{k})$ , where
- $q^{'}$ is the new state stored in the register;
- $γ_{i}^{'} \in Γ$ is a new symbol that is written under the $i$ th tape head for $i \geq 2$ (i.e., it excludes the input tape); and
- $d_{i} \in {L, R, S}$ specifies moving tape head $i \in [n]$ one space Left, one space Right, or telling the tape head to Stay.²
If the Turing machine is ever in the state $q_{ma t h r m ha lt}$ , it stops executing the transition function and halts.

Additionally, all Turing machines satisfy the following.

All $k$ tapes are initially set with $□$ in every location.
The first index of every tape is then initialized to $▹$ . All tape heads begin here.
On input $x$ , all Turing machines $M$ will:
- Move the input head $R$ , then write $x$ to the input tape.
- Move the input head $L$ until it reaches $▹$ .
- Set the initial state in the register to $q_{start}$ . The Turing machine is now ready to begin its computation.

We call this the initial configuration of the Turing machine.

A graphical example of a $3$ -tape Turing machine is presented below.

3-Tape Turing Machine

Turing machine example: Palindromes

Let’s see an example of a Turing machine in action. We will be using Turing machines to compute functions. Let $f : {0, 1}^{*} \to {0, 1}$ be a function such that $f (x) = 1$ if and only if $x \in {0, 1}^{*}$ is a palindrome. That is, $x = reverse (x)$ ; equivalently, if $n = ∣ x ∣$ , then $x_{i} = x_{n - i + 1}$ for all $i \in [n]$ . Let’s design a Turing machine for computing $f$ .

High-level TM Specification. Let $M_{f}$ be a $3$ -tape Turing machine (1 input, 1 output, 1 work tape). On input $x \in {0, 1}^{n}$ for any $n \in N$ , $M_{f}$ will do the following.

Copy the input $x$ to the work tape.
Move the input head to the start of the input tape (with $▹$ under the head), leave the work head at the last position (with $□$ under the head).
Move the input head one position right (with $x_{1}$ under the head) and move the work head one position left (with $x_{n}$ under the head).
Read the symbols under the input head and the work head.

(a) If the symbol under the input head is $□$ and the symbol under the work head is $▹$ , then write $1$ to the output tape and halt.

(b) Else if the symbols under the input head and work head are not equal, write $0$ to the output tape and halt.

(c) Else (i.e., the symbols are equal) move the input head one step right and move the work head one step left.

Formal TM Specification of above. We now formalize thie above process. To do so, we specify (1) the input alphabet; (2) the set of states for $M_{f}$ ; and (3) the transition function for $M_{f}$ .

The input alphabet $Σ$ is simply ${0, 1}$ . This tells us our tape alphabet is $Γ = {0, 1, ▹, □}$ .
The set of states will be $Q = {q_{start}, q_{halt}, q_{copy}, q_{left}, q_{test}}$ .
The transition function $δ : Q \times Γ^{3} \to Q \times Γ^{2} \times {L, R, S}^{3}$ is defined as follows. Let $(q, γ_{1}, γ_{2}, γ_{3})$ be a configuration given as input to $δ$ .
- If $q = q_{start}$ , move both the input head and work head right and change the state to $q_{copy}$ .
- If $q = q_{copy}$ , then read the symbol under the input head (i.e., read $γ_{1}$ ).
  - If $γ_{1} \neq = □$ , write $γ_{1}$ to the current position of the work tape. Then move both the input head and work head one step right. Keep the state as $q_{copy}$ . In this case, the output of the transition function is $(q_{copy}, γ_{1}, γ_{3}, R, R, S)$ .
  - If $γ_{1} = □$ , then move the input head left and change the state to $q_{left}$ . In this case, the output of the transition function is $(q_{left}, γ_{2}, γ_{3}, L, S, S)$ .
- If $q = q_{left}$ , read the symbol under the input head (i.e., $γ_{1}$ ).
  - If $γ_{1} \neq = ▹$ , then move the input head left and keep the state as $q_{left}$ . In this case, the output of the transition function is $(q_{left}, γ_{2}, γ_{3}, L, S, S)$ .
  - If $γ_{1} = ▹$ , then move the input head right, move the work head left, and change the state to $q_{test}$ . In this case, the output of the transition function is $(q_{test}, γ_{2}, γ_{3}, R, L, S)$ .
- If $q = q_{test}$ , read what’s under the input and work heads (i.e., $γ_{1}$ and $γ_{2}$ ).
  - If $γ_{1} = □$ and $γ_{2} = ▹$ , then write $1$ on the output tape and change the state to $q_{halt}$ . In this case, the transition function outputs $(q_{halt}, γ_{2}, 1, S, S, S)$ .
  - Else if $γ_{1} \neq = γ_{2}$ , then write $0$ on the output tape and change the state to $q_{halt}$ . In this case, the transition function outputs $(q_{halt}, γ_{2}, 0, S, S, S)$ .
  - Else (i.e., $γ_{1} = γ_{2}$ ), then move the input head right and the work head left, keeping the state as $q_{test}$ . In this case, the transition function outputs $(q_{test}, γ_{2}, γ_{3}, R, L, S)$ .

Turing Machine Equivalences

At first, the Turing machine model seems like a very restrictive model that cannot compute many things, especially real-life computers. However, as we will see, this restrictive computational model is (roughly) equivalent to nearly every other computational model people have thought of over the years. This includes:

Random access machines;
Turing machines with write-only output tapes;
$λ$ -calculus;
Single-tape Turing machines;
Turing machines with bidirectional infinite tapes;
Pointer and Counter machines;
Turing machines with only binary input alphabets;
Oblivious Turing machines.

In the next lecture or two, we will quantify what we mean by Turing machines being equivalent to the above notions.

Quantum computers are the one (almost) physically realizable computational model that we have which does not seem to admit efficient simulation on Turing machines. ↩
If the Turing machine specifies that a tape head to move left, but it’s at the start of the tape, the head simply stays in the same place. ↩

Lecture 2

In-class notes: CS 505 Spring 2025 Lecture 2

Measuring Runtime of Turing Machines

With the definition of Turing machines established, we can turn towards quantifying the run-time of Turing machines. Informally, the run-time of a Turing machine computing some function $f$ is the maximum amount of time needed to compute $f$ on all inputs (of a fixed length), where our measure of time corresponds to how many executions of the transition function $M$ needs to utilize to compute $f$ . First, we need to actually define what it means for a Turing machine to compute a function $f$ .

Definition (Turing Machine Computation). Let $f : {0, 1}^{*} \to {0, 1}^{*}$ be a function and let $M$ be a Turing machine. We say that $M$ computes $f$ if for all $x \in {0, 1}^{*}$ , when $M$ is initialized with input $x,$ it halts with $f (x)$ on the output take. We denote this as $M (x) = f (x)$ .

Now we can define the run-time of a Turing machine computing a function $f$ .

Definition (Turing Machine Run-time). Let $f$ be a function and let $M_{f}$ be a Turing machine which computes $f$ . Furthermore, let $T : N \to N$ be a function. We say that $M_{f}$ computes $f$ in time $T$ if $M_{f} (x) = f (x)$ in at most $T (∣ x ∣)$ steps for all $x \in {0, 1}^{*}$ (i.e., $O (T (∣ x ∣))$ steps). Here, a step of the Turing machine $M_{f}$ is a single execution of its transition function $δ$ .

In essence, executing one step of the transition function of a Turing machine is the atomic Turing machine operation.

Time Constructible Functions

An important concept is the idea of time constructible functions, which we will use to quantify and show equivalences among different Turing machine models. It will also be used in later topics (e.g., time hierarchy theorems).

Definition (Time Constructible Function). Let $T : N \to N$ be a function. Then we say that $T$ is time construictible if and only if (1) $T (n) ⩾ n$ ; and (2) there exists a Turing machine $M$ such that for all $x \in {0, 1}^{*}$ , we have $M (x) = ⟨ T (∣ x ∣) ⟩_{2}$ in time $O (T (∣ x ∣))$ . Here, $⟨ T (∣ x ∣) ⟩_{2}$ denotes the binary representation of $T (∣ x ∣)$ .

Note

This is slightly different than how I presented it in class.

Examples of time constructible functions include $T (n) = ⎩ ⎨ ⎧ n n lo g (n) n^{2} 2^{n}$ Notably, $T (n) = lo g (n)$ , or any $ω (1) ⩽ T (n) ⩽ o (n)$ , are not time constructible.

The above examples of non-time constructible functions highlight a key idea behind time constructibility: a Turing machine (usually) needs to read its entire input in order to compute a function. The stipulation $T (n) ⩾ n$ allows for a Turing machine to at least read the entire input before computing $T (∣ x ∣)$ . If this restriction is removed, then $T (n) = Θ (1)$ is still time constructible (you simply ignore all inputs and write the constant to the output tape), but $ω (1) ⩽ T (n) ⩽ o (n)$ remain non-time constructible since the Turing machine is expected to compute $T (∣ x ∣)$ in less time than it takes to read $x$ !

Turing Machine Equivalences

A function $T$ being time constructible turns out to be a key factor in how we define equivalences among Turing machines (and other models as well).¹ Informally, we say that a computational model $A$ is equivalent to a computational model $B$ if any for any computation capable of being performed in $B$ can be performed in $A$ (with at most polynomial time overhead). In the context of Turing machines, we say that a computational model $A$ is equivalent to the Turing machine model if any problem solvable in time $T$ in model $A$ can be solved by a Turing machine running in time $O (T^{c})$ for constant $c > 0$ . Intuitively, if $P$ is a program in computational model $A$ running in time $T$ , then a Turing machine will simulate model $A$ in order to run program $P$ (e.g., similar to modern interpreted programming languages like Python). If $P$ runs in time $T$ , and $T$ is not time constructible, then this simulation will not meet our requirements; i.e., it will not be an efficient simulation.

As mentioned in Lecture 1, the $k$ -tape Turing machine model we have been working with is equivalent to many other Turing machine (and non-Turing machine) models. We state these relations formally below.

First, recall that it is sufficient to consider a Turing machine which only uses a binary alphabet.

Lemma 2.1. For every $f : {0, 1}^{*} \to {0, 1}$ and time constructible $T$ , if $f$ is computable in time $T$ by a Turing machine $M_{f}$ with tape alphabet $Γ$ , then it is computable in time $O (T \cdot lo g (∣Γ∣))$ by a Turing machine $M_{f}$ with tape alphabet $Γ = {0, 1, ▹, □}$ .

Proof Sketch. The main idea is to encode the (non-start and non-blank) symbols of $Γ$ using bits. This requires roughly $⌈ lo g (∣Γ∣)⌉$ bits to uniquely encode $Γ$ in binary. Then the new Turing machine $M_{f}$ simply encodes each symbol from $Γ$ on its tapes in binary. To simulate a single step of $M_{f}$ , the machine $M_{f}$ must read $lo g (∣Γ∣)$ bits from each tape, translate the symbol read into its current state, then execute $M_{f}$ ’s transition function. $□$

Next, it turns out that any $k$ -tape Turing machine can be readily simulated by a single-tape Turing machine (which many of you may have seen before).

Lemma 2.2. For every $f : {0, 1}^{*} \to {0, 1}$ and time constructible $T$ , if $f$ is computable in time $T$ by a $k$ -tape Turing machine, then it is computable in time $O (k \cdot T^{2})$ by a single-tape Turing machine.

Proof Idea. The proof idea is to stagger the $k$ tapes onto the single-tape machine. Notably, since each of the $k$ tapes is infinite, if you try to write them side-by-side on a single-tape machine, you would inevitably run into a situation where you reach the end of an allocation for a work tape, so you’d have to shift the entire contents of the remaining tapes right one space. This would blow-up the time to simulate. So instead, you stagger the $k$ tapes. Consider tape $i$ . Then positions $1, 2, 3, \dots, j, \dots$ of tape $i$ would be written to positions $i, i + k, i + 2 k, \dots, i + (j - 1) k, \dots$ on the single-tape machine. $□$

It also turns out that having tapes which are infinite in both directions does not buy you much in terms of computational efficiency.

Lemma 2.3. For every $f : {0, 1}^{*} \to {0, 1}$ and time constructible $T$ , if $f$ is computable in time $T$ by a $k$ -bidirectional tape Turing machine (i.e., every tape is infinite in both directions), then $f$ is computable in time $4 \cdot T (n)$ by a standard $k$ -tape Turing machine (i.e., tapes that are infinite in one direction).

Proof Idea. You can approach this two different ways.

Cut each bidirectional tape in half, then stagger this tape onto a single tape (similar to Lemma 2.2 above).
If the bidirectional Turing machine has tape alphabet $Γ$ , let the standard Turing machine have tape alphabet $Γ = Γ \times Γ$ . Then you can encode the bidirectional tape onto the single tape using $Γ$ . $□$

Universal Turing Machines

We’ve discussed how our $k$ -tape Turing machine is equivalent to many other Turing machine models. Next, we will see that we can simulate any Turing machine (in any equivalent model). Much like how the modern computer can run any computation you give it, we will see there is a universal Turing machine which can simulate any Turing machine you give it as input.

Turing Machines are (Binary) Strings

We’ve focused our attention on Turing machines $M$ which compute some function $f : {0, 1}^{*} \to {0, 1}^{*}$ , and we haven’t given much thought to how we write down the machine $M$ . It turns out that we can conveniently describe Turing machines simply as binary strings. We’ll let $⟨ M ⟩_{2} \in {0, 1}^{*}$ denote the binary string which represents the Turing machine $M$ . Note: there are an infinite number of strings $x \in {0, 1}^{*}$ which represent a single Turing machine $M$ .

For any $α \in {0, 1}^{*}$ , we will let $M_{α}$ denote the Turing machine specified by the string $α$ . In this light, notice that

We’ve always talked about Turing machines computing some function $f : {0, 1}^{*} \to {0, 1}^{*}$ ;
Turing machines themselves are such a function; and
Turing machines can also be inputs to these functions!

So there must be a Turing machine which can take Turing machines as input and compute the function that this Turing machine would have computed! This is the universal Turing machine.

Theorem 2.4 (Hennie & Stearns, 1966). There exists a Turing machine $U$ such that for all $α, x \in {0, 1}^{*}$ , $U (α, x) = M_{α} (x)$ . That is, $U (α, x)$ computes the output of $M_{α}$ when run with input $x$ . Moreover, if $M_{α}$ halts within $T$ steps on any input for time constructible $T$ , then $U (α, x)$ halts in $O (T \cdot lo g (T))$ steps, where the hidden constant only depends on $M_{α}$ ’s alphabet size, number of states, and number of tapes.

The proof of the above theorem can be found below (see Proof of Theorem 2.4). Here, we’ll give the proof of the above with $O (T \cdot lo g (T))$ replaced with $O (T^{2})$ .

Proof with time bound $O (T^{2})$ . Suppose that $α, x \in {0, 1}^{*}$ . Without loss of generality, we can assume that the Turing machine $M_{α}$ has tape alphabet ${0, 1, ▹, □}$ and has a single work tape (i.e., it is a $3$ -tape Turing machine). If not, then $U$ can transform $M_{α}$ into an equivalent Turing machine, denoted as $M_{α}$ , with these properties by Lemmas 2.1 and 2.2.² In this case, if $M_{α}$ runs in time $T$ , then the resulting equivalent Turing machine $M_{α}$ runs in time $O (T^{2})$ (ignoring the $lo g ∣Γ∣$ factors since $∣Γ∣$ is fixed).

The universal machine $U$ will be a $5$ -tape Turing machine; i.e., one input tape, one output tape, and 3 work tapes. $U$ has alphabet $Γ_{U} = {0, 1, ▹, □}$ . Now $U$ will simulate $M_{α}$ as follows.

$U$ uses its input, output, and first work tape to identically copy the operations $M_{α}$ performs on these tapes (recall $M_{α}$ has $3$ tapes).
$U$ encodes the state space $Q$ of $M_{α}$ on its second work tape.
$U$ encodes the transition function $δ : Q \times Γ^{3} \to Q \times Γ^{2} \times {L, R, S}^{3}$ of $M_{α}$ on its third work tape. The transition function is simply encoded as a table of key-value pairs.

In order to simulate a single step of $M_{α}$ ’s computation, the machine $U$ does the following.

Read the current symbols under the input tape, output tape, and first work tape. This identically matches what $M_{α}$ does and takes constant time.
Read the current state of $M_{α}$ from the second work tape. Since the tape alphabet is binary, the states of $M_{α}$ take $⌈ lo g ∣ Q ∣ ⌉$ bits to encode, so reading the current state takes $O (lo g ∣ Q ∣)$ time steps (i.e., move to the end of the current state, go back to the start of the work tape).
Let $q_{current}$ be the current state, and let $γ_{1}, γ_{2}, γ_{3}$ be the symbols read from the input, output, and first work tapes, respectively. Scan the third work tape for the key $(q_{current}, γ_{1}, γ_{2}, γ_{3})$ .
Once this key is found, read the value from the corresponding table entry. The value is exactly $(q^{'}, γ_{2}^{'}, γ_{3}^{'}, d_{1}, d_{2}, d_{3}) = δ (q_{current}, γ_{1}, γ_{2}, γ_{3})$ , where $d_{i} \in {L, R, S}$ for $i \in [3]$ .
Execute the transition function $δ$ of $M_{α}$ .
1. Write $γ_{2}^{'}$ to the output head and $γ_{3}^{'}$ to the head of work tape 1. This takes constant time.
2. Write the new state $q^{'}$ to the second work tape and reset the tape head after. This take $O (lo g ∣ Q ∣)$ time.
3. Move tape head $i$ direction $d_{i}$ for $i \in [3]$ . This takes constant time.
4. Move the head of the third work tape back to the start.

Now, the time complexity of (3) and (5.4) above are the same. In particular, in the worst case, $U$ must scan to the end of the table representing $δ$ to find the correct state. There are $∣ Q ∣ \cdot ∣Γ ∣^{3}$ keys in this table, and each key has an entry in $Q \times Γ^{2} \times {L, R, S}^{3}$ . Since $∣Γ∣ = 4$ and because we can encode ${L, R, S}$ with only two more bits, we can conclude that each table entry (i.e., each key-value pair) has length $O (∣ Q ∣)$ . This means to write down a single entry, we need $O (lo g ∣ Q ∣)$ bits. Moreover, there are a total of $O (∣ Q ∣)$ entries in the table, so the total time of executing (3) or (5.4) is at most $O (∣ Q ∣ lo g ∣ Q ∣)$ time.

Since $∣ Q ∣$ is fixed, to simulate a single step of $M_{α}$ on $U$ requires $O (1)$ time. So if $M_{α}$ runs in time $T^{'}$ , then $U$ runs in time $O (T^{'})$ . Now $T^{'} = O (T^{2})$ by the transformation we performed on $M_{α}$ to obtain $M_{α}$ . Thus, $U$ simulates $M_{α}$ in at most $O (T^{2})$ time. $□$

Turing Machines and Languages

We’ve spent most of our time discussing Turing machines and how they compute functions. We’ll now shift to mostly talking about Turing machines in the context of deciding languages.

Recall that a language $L$ is simply a subset of ${0, 1}^{*}$ . Notably, we can define a function $f_{L} : {0, 1}^{*} \to {0, 1}$ as $f (x) = 1$ if and only if $x \in L$ ; this immediately implies that $f (x) = 0$ if and only if $x \neq \in L$ . So there is a natural correspondence to computing functions and deciding set membership in a language $L$ .

Key to our later dealings with complexity classes will be the idea of Turing decidability. We’ll build up to this idea by first introducing Turing recognizability.

Definition (Turing Recognizable Language). A language $L$ is said to be Turing recognizable if there exists a Turing machine $M_{L}$ such that for all $x \in L$ , $M_{L} (x) = 1$ . In particular, $M_{L} (x)$ always halts and outputs $1$ if $x \in L$ .

Recognizability only requires that the Turing machine halt on any valid member of the language. If, however, one hands this Turing machine $x \neq \in L$ , its behavior is undefined and not guaranteed! We’d like to strengthen this to make sure our Turing machine always halts, whether or not its input is in the language. This gives us decidability.

Definition (Turing Decidable Language). A language $L$ is said to be Turing decidable if there exists a Turing machine $M_{L}$ such that the following hold for any $x \in {0, 1}^{*}$ :

$M_{L} (x) = 1$ if and only if $x \in L$ ; and
$M_{L} (x) = 0$ if and only if $x \neq \in L$ .

Notice that the above definition immediately means that $M_{L}$ halts on all possible inputs. This is because, equivalently stated, if $x \neq \in L$ then $x \in \overline{L}$ , where $\overline{L}$ is the complement of $L$ , which is defined as $\overline{L} = {0, 1}^{*} ∖ L$ (i.e., everything in ${0, 1}^{*}$ but not in $L$ ).

An equivalent definition of decidability states that both the language and its complement are recognizable.

Lemma 2.5. A language $L$ is Turing decidable if and only if both $L$ and $\overline{L}$ are Turing recognizable.

Undecidability

Unfortunately, there are many (interesting) languages that are undecidable; that is, there does not exist any Turing machine which decides the language. We’ll begin by showing the existence of at least one undecidable language.

Theorem 2.6. There exists a language $L_{UC}$ that not Turing decidable (i.e., it is undecidable).

Proof. First define a language $L = {α ∣ M_{α} (α) = 1}$ ; i.e., $L$ is the set of all strings $α$ such that the Turing machine $M_{α}$ , when given input its own description $α$ , halts and outputs $1$ . Now define the complement language $L_{UC} = {α ∣ α \neq \in L}$ . We claim that $L_{UC}$ is undecidable.

We show this via a proof by contradiction. So towards contradiction, assume that $L_{UC}$ is decidable. Then there exist a Turing machine $M_{UC}$ which decides this language. This implies that for any $α \in {0, 1}^{*}$ , $M_{UC} (α) = 1$ if and only if $α \in L_{UC}$ and $M_{UC} (α) = 0$ if and only if $α \neq \in L_{UC}$ .

Consider $M_{UC} (⟨ M_{UC} ⟩_{2})$ . We have that

$⟺ ⟺ ⟺ M_{UC} (⟨ M_{UC} ⟩_{2}) = 1 ⟨ M_{UC} ⟩_{2} \in L_{UC} ⟨ M_{UC} ⟩_{2} \neq \in L M_{UC} (⟨ M_{UC} ⟩_{2}) = 0 or never halts.$ Thus, we have a contradiction as $1 \neq = 0$ . This implies that $L_{UC}$ is undecidable. $□$

Notably, the above proof technique is known as diagonalization. We’ll use it later when we discuss time hierarchy theorems.

Question

Is $L_{UC}$ Turing recognizable?

Different from Class

I incorrectly stated in class that is was recognizable. However, $L_{UC}$ is, in fact, not recongizable. This is because $L$ from the above proof is recognizable. By Lemma 2.5, if $L_{UC}$ was Turing recognizable, then it would be decidable, but clearly it is not!

One may argue that the language $L_{UC}$ is not a very interesting language class, and may not come up in the real world. However, we’ll take one-step up and consider a more interesting language that would be great for us if it were decidable! Unfortunately, it is not decidable.

The Halting Problem

The Halting problem asks the following simple question: given a Turing machine $M$ , does it halt on input $x$ ? More formally, it is specified by the following language: $L_{H} = {(α, x) ∣ M_{α} (x) halts in a finite number of steps} .$

Theorem 2.7. $L_{H}$ is undecidable.

We’ll give the proof of this theorem in Lecture 3.

Proof of Theorem 2.4

This proof is taken directly from Arora & Barak’s book with the following notes:

Theorem 1.9 in the proof corresponds to Theorem 2.3 in these lecture notes;
Claim 1.6 in the proof corresponds to Lemma 2.2 in these lecture notes; and
Claim 1.5 in the proof corresponds to Lemma 2.1 in these lecture notes.

The proof can be found in the following pdf: Proof of Theorem 2.4

“Key” here meaing it makes proofs much simpler. ↩
Lemma 2.2 tells us that a $k$ -tape Turing machine can be simulated by a one-tape Turing machine with quadratic overhead. The same proof can be applied to reduce $k$ -tapes to $3$ -tapes, with a single input, output, and work tape (i.e., transform the $k - 2$ work tapes into a single work tape, keep the input/output tapes the same). ↩

Lecture 3

In-class notes: CS 505 Spring 2025 Lecture 3

Undecidability Wrap-up

We begin by wrapping up our discussion of undecidability.

The Halting Problem

From last time, we’ll finish proving that the halting problem is undecidable. First, recall the definition of the halting problem.

$L_{H} = {(α, x) ∣ M_{α} (x) halts in a finite number of steps} .$

Theorem 2.7. $L_{H}$ is undecidable.

Proof. We’ll prove this via a reduction to the language $L_{UC}$ from last lecture, defined as

$L_{UC} = {α ∣ M_{α} (α) = 0 or never halts} .$

Our proof will be by contradiction. In particular, this means we’ll assume that $L_{H}$ is decidable, then derive our contradiction by giving a decider for $L_{UC}$ .

Thus assume that $L_{H}$ is decidable. This means there is a Turing machine $M_{H}$ which decides $L_{H}$ . This tells us that for every pair $(α, x)$ , we have $M_{H} (α, x) = 1$ if and only if $M_{α} (x)$ halts, and $M_{H} (α, x) = 0$ if and only if $M_{α} (x)$ does not halt.

We’ll use $M_{H}$ to build a Turing machine $M_{UC}$ which decides $L_{UC}$ . For any $α$ , define $M_{UC}$ as follows.

$M_{UC} (α) :$
- Set $b = M_{H} (α, α)$ .
- If $b = 0$ , then output 1.
- If $b = 1$ , then set $b^{'} = M_{α} (α)$ .
  - If $b^{'} = 0$ , output 1.
  - If $b^{'} = 1$ , output 0.

Since $M_{H}$ is a decider, it halts on all possible inputs $(α, α)$ . Now, if $M_{H} (α, α) = 0$ , we know that $M_{α} (α)$ does not halt, which implies that $α \in L_{UC}$ . So we set $M_{UC} (α) = 1$ in this case. Next, if $M_{H} (α, α) = 1$ , we know that $M_{α} (α)$ does halt. We then test the output of $M_{α} (α)$ by running it. If $M_{α} (α) = 0$ , then again we know $α \in L_{UC}$ , so we set $M_{UC} (α) = 1$ . Otherwise, $M_{α} (α) = 1$ , and thus $α \neq \in L_{UC}$ , so we set $M_{UC} (α) = 0$ .

Thus, $M_{UC}$ halts on all possible inputs, and clearly $M_{UC}$ decides $L_{UC}$ . This contradicts our previous result that $L_{UC}$ is undecidable. Therefore, $L_{H}$ is undecidable. $□$

Final Remarks on Undecidability

Rice’s Theorem

It would be great if the halting problem were decidable, as it would give us an efficient way to check if programs halt on all possible inputs. However, one may be wondering if there are other properties about programs we can efficiently decide/test. For example, “does this program have at least 5 for-loops?” or “does this program have a switch statement, followed by an if-then-else?” Unfortunately, these are also undecidable problems.

This is a result known as Rice’s Theorem, which informally states that it is impossible to determine if a computer program has any non-trivial property $P$ . I.e., the language $L_{P} = {all programs with non-trivial property P}$ is undecidable. Here, a non-trivial property $P$ is a property which is not true or false for every program (i.e., there are some programs that satisfy $P$ , and some which do not).

Mathematical Incompleteness

The idea of undecidability (and uncomputability) is closely related to (and inspired by) Gödel’s incompleteness theorem. In the early 1900’s, there was a large push to establish a set of mathematical aximoms from which you can prove or disprove any mathematical property. However, Gödel proved this is impossible. He showed that no matter what set of axioms you choose, there will always be theorems you cannot prove or disprove. This actually inspired the results on undecidability/uncomputability, and is closely related to these ideas.

Time-Efficient Computations

We’ll now turn our focus to a central topic in complexity theory: defining classes of efficient computations. This leads us to defining and discussing various complexity classes. Informally, a complexity class is simply a set of languages which are decidable (resp., computable) within some resource bound. Example resource bounds include running in linear time, running in logarithmic space, etc.

Deterministic Time

Building towards what we as computer scientists consider efficient, we turn to time bounds. We’ll define the notion of deterministic time.

Definition. Let $T : N \to N$ be a function. A language $L$ is in the class DTIME $(T)$ if and only if $L$ is decidable by a (deterministic) Turing machine in time $O (T)$ .

All Turing machines we’ve discussed and defined so far have been deterministic. These machines all have straight-line computations: they execute their transition function, which simply outputs the next state. Later, we’ll see non-deterministic Turing machines, where the transition function can output a set of possible states and the Turing machine non-deterministically decides which state to pick next.

The Complexity Class P

Given the definition above, we can now define the set of (what we consider to be) all efficient computations. This is the complexity class P (which stands for polynomial).

Definition (P) $P = c \in N ⋃ DTIME (n^{c})$

We consider anything computed in polynomial time (with respect to the input length) to be efficient. Examples of problems/languages in P include:

Graph connectivity
Digraph path exists
Checking if a graph is a tree
Integer multiplication: does $x \cdot y = z$ ?
Are the integers $x$ and $y$ relatively prime?
Gaussian elimination over rational numbers: For matrix $A$ and vector $b$ , does there exist $x$ such that $A x = b$ ?

Discussions on P

Does the computational model matter?

We’ve defined P with respect to $k$ -tape Turing machines. But, as we’ve seen, $k$ -tape Turing machines are equivalent to all other Turing machine models we’ve seen, including RAM Turing machines which reasonably emulate real-life computers. Moreover, the “equivalence” here is that all machines can simulate all other ones with at most polynomial overhead in the runtime. This means all of these computations still fall within the class P.

In fact, many people believe that Turing machines can simulate any physically realizable computational model or system. This is known as the Church-Turing thesis.¹ Some people also believe in the strong Church-Turing thesis, which states that this simulation can be done with only polynomial overhead in the runtime. However, as we get closer to quantum computing being physically realizable, people may stop believing in this since, for now, we do not know of a way to simulate quantum computations on standard Turing machines with only polynomial overhead.

Why polynomial time?

It is certainly true that an algorithm running in time $n^{100}$ is impractical starting at $n = 2$ ; yet this is a polynomial. Why do we consider all polynomial time algorithms to be “efficient?”

One reason is above: the Turing machine is polynomially-equivalent to pretty much every model we have thought of, so it makes sense that polynomial time should appear somewhere in what we consider to be an efficient computation. Polynomials also compose well, which emulates how we compose computer programs. Often, computer programs will run sub-routines, and will run routines one after another. If all these runtimes are polynomial, then the final runtime remains polynomial as well. This is since for two polynomials $p (X)$ and $q (X)$ , the functions $p (X) + q (X)$ , $p (X) \cdot q (X)$ , and $p (q (X))$ or $q (p (X))$ are all still polynomials.

Another reason is historic and heuristic. Often in history, someone is able to solve a problem in polynomial time, but for some large polynomial like $n^{50}$ . But this algorithm is later improved to a more reasonable polynomial, such as $n^{5}$ or $n^{3}$ .

Finally, polynomial-time problems are roughly equivalent to most (if not all) problems that we can efficiently solve on modern computers.

Worst-case time complexity is too restrictive

If you have a problem where for $99%$ of the inputs you have an $n^{2}$ algorithm, but for $1%$ you have an $n^{7}$ algorithm, then we’d say the algorithm runs in time $O (n^{7})$ . In particular, we keep P as a worst-case class. Some argue that this is too restrictive, which is valid. However, often it is much simpler to construct an algorithm that can solve all problem inputs in some amount of time, rather than trying to enumerate (the possibly infinite amount of) the inputs which have better algorithms.

This criticism of P is also addressed in complexity theory itself via the introduction of alternative models and classes, including approxmiation algorithms and average-case complexity.

Decision problems are too limited

We’ve framed P as a class of decision problems, but often we actually want to find solutions to these problems. This is known as a search problem, where you are asked to find an answer rather than decide if something is true or false. An example of this is: instead of deciding if there exists an $x$ such that $A x = b$ , you just compute the solution $x$ . It can also be difficult to frame search problems as decision problems in the first place.

However, most often it is the case that the difference between search and decision problems is, again, only polynomial. That is, we often can solve a search problem when given an algorithm that decides the equivalent decision problem, only costing us polynomial overhead in the runtime; the reverse is often true as well.

Time-Efficient Verification of Problems

Sometimes, we don’t want to solve problems, but would like to verify solutions when given an answer. Moreover, this verification should at least as efficient as solving the problem itself.

Example

Suppose we are given a large integer $N \in N$ and would like to find the prime factors of $N$ , which we denote as $p_{1}, \dots, p_{k}$ . We believe it to be difficult to find $p_{1}, \dots, p_{k}$ given just $N$ . However, if someone gives you some numbers $q_{1}, \dots, q_{k}$ which are claimed to be the prime factors of $N$ , there is a simple and efficient algorithm to verify this is true.

Check that each $q_{i}$ is prime.
Check that $N = q_{1} \cdot \dots q_{k}$ .

Clearly (2) is efficient, only requiring $k$ integer multiplications. A relatively recent result showed that (1) is also efficient and doable in polynomial-time. So verifying that $N$ is the product of $q_{1}, \dots, q_{k}$ is also efficient.

Efficiently Verifiable Languages

This gives us a new way to define languages: efficiently verifiable languages.

Definition. Let $L \subset {0, 1}^{*}$ be a language. We say that $L$ is efficiently verifiable if there exists polynomials $p$ and $q$ , and Turing machine $M_{L}$ running in time $q$ such that $x \in L ⟺ \exists w \in {0, 1}^{p (∣ x ∣)} s.t. M_{L} (x, w) = 1.$

In the above definition, we call $M_{L}$ a verifier, $x$ the instance, and $w$ the certificate or witness.

The Class NP

The above new notion of languages gives us a new complexity class: NP.

Definition (NP). $NP = {L : L is efficiently verifiable} .$

P vs. NP

Central Question in all of Complexity Theory

True or false: $P = NP$ ?

We widely believe that

P \neq = NP

. In fact, we build many systems (e.g., cryptography) based on the above assumption. Resolving this either way is one of the [Millennium Prize Problems](https://www.claymath.org/millennium-problems/).

However, we do know one thing for certain.

Theorem 3.1. $P \subseteq NP$ .

This is true since every problem in $P$ can be decided in polynomial time with no witness/certificate. So it meets the definition of efficiently verifiable.

Non-deterministic Turing Machines and NP

There is an alternative definition of the class $NP$ , which utilizes non-deterministic Turing machines.

Definition. A non-deterministic $k$ -tape Turing machine is identical to a (deterministic) $k$ -tape Turing machine, except for the following modifications.

The transition function of the non-deterministic Turing machine is defined as $δ : Q \times Γ^{k} \to P (Q \times Γ^{k - 1} \times {L, R, S}^{k})$ , where $P$ denotes the power set operation.² During any step of the computation, the transition function outputs a (possibly empty) list of next possible Turing machine configurations.
Given a list of next possible configuration from the transition function, the non-deterministic Turing machine non-deterministically chooses the next configuration to execute from this list.³

Intuitively, deterministic Turing machines (the ones we defined in Lecture 1) are “straight-line”: every step of the computation proceeds directly from the previous one. For non-deterministic Turing machines (which we’ll denote as NTMs), they look more like “branching” programs: at every step of the computation, the Turing machine has a set of possible computational paths to head down, and non-deterministically chooses the path to proceed down.

How do we define decidability of a language with respect to NTMs? At first, it may seem difficult since there are many possible paths an NTM can do down during its computation. But, the answer turns out to be simple: we require all computational paths to halt, and there to be at least one accepting path (out of possibly exponential) which correctly outputs the decision.

Definition. A language $L$ is decidable in time $T$ by a non-deterministic Turing machine $M$ if

$x \in L$ if and only if there exists at least one execution path such that $M (x) = 1$ .
All execution branches halt in time at most $T (∣ x ∣)$ for any $x \in {0, 1}^{*}$ .

We can use this above definition to expand DTIME to NTIME.

Definition. Let $T : N \to N$ be a function. Then we define $NTIME (T)$ to be the set of all languages $L$ decidable by an NTM running in time $O (T)$ .

Alternative Definition of NP

Given NTMs and NTIME, we can now see the original formulation of the class NP.

Theorem 3.2. $NP = c \in N ⋃ NTIME (n^{c}) .$

Note that this definition is equivalent to the efficiently verifiable language definition. At a high level, this is because of the following reduction.

Let $w$ be a witness to the fact that $x \in L$ (i.e., $M_{L} (x, w) = 1$ for efficient verifier $M_{L}$ ). Then, intuitively, $w$ correpsonds to some correct computational path on an NTM which decides $L$ .
Let $M$ be an NTM which decides $L$ . Then we can specify a witness $w$ which is the computational path that $M$ takes to an accepting state. The deterministic machine $M_{L}$ takes this as input and simulates the NTM $M$ by following the computational path specified by $w$ .

Example

Recall our prime factor problem from before. Let $N \in N$ be a large integer, and suppose we wish to find the prime factors $p_{1}, \dots, p_{k}$ of $N$ . Then there is an extremely simple NTM which finds these prime factors. Let $M$ be this machine. It does the following.

Non-deterministically choose prime numbers $q_{1}, \dots, q_{k}$ .
Check if $N = q_{1} \cdot \dots \cdot q_{k}$ . If yes, output $1$ ; else output $0$ .

Solving NTIME in DTIME

Currently, until $P$ vs. $NP$ is resolved, the most efficient ways that we know of to solve problems in NTIME using only DTIME computations requires exponential time. Let $EXP$ denote the class $EXP = c \in N ⋃ DTIME (2^{n^{c}}) .$

Lemma 3.3. $NP \subset EXP$ .

Proof. Enumerate all possible branches of the NTM deciding the language $L \in NP$ (equivalently, enumerate all certificates/witnesses in the verifier definition). Then, run through this list until finding an accepting branch of the computation. If the original machine ran in time $T (n)$ , then this procedure runs in time $O (2^{T (n)})$ . By assumption, $T (n)$ is a polynomial, so we are done. $□$

Note this is just a belief and not a formal theorem or conjecture. ↩
Given a set $S$ , the power set of $S$ , denoted as $P (S)$ , is the set of all possible subsets of $S$ . Notably, $∣ P (S) ∣ = 2^{∣ S ∣}$ . ↩
Recall that non-determinism is not the same as behaving randomly. The choice of a non-deterministic machine is arbitrary and possibly not computable. ↩

Lecture 4

In-class notes: CS 505 Spring 2025 Lecture 4

Recall how we have the notion of a universal Turing machine: a machine that can simulate and solve any problem that any other Turing machine can solve. We’d like to now define a notion that is similar to this where, if you can solve one problem efficiently, then you can use that algorithm to solve a different problem (also efficiently). This leads us to the notion of reducibility.

Reducibility

As above, the idea of reducibility is that if I can solve problem $B$ (i.e., decide the language $B$ ), then I can use $B$ to solve (i.e., decide) a different language $A$ . Moreover, this is efficient: there is only a polynomial overhead in using $B$ to solve $A$ .

Definition (Polynomial-time Reducibility). Let $A$ and $B$ be languages. We say that $A$ is polynomial-time reducible to $B$ , denoted as $A \leq_{p} B$ if there exists a function $f$ that is computable in polynomial-time such that $x \in A ⟺ f (x) \in B .$

Warning

Note that in the above definition, it is saying that if we can solve $B$ , then we can use $B$ to efficiently solve $A$ . This notation can be confusing to some people (I myself dislike it), so just be aware.

Lemma 4.1 (Reducibility is Transitive). Let $A$ , $B$ , and $C$ be languages. If $A \leq_{p} B$ and $B \leq_{p} C$ , then $A \leq_{p} C$ .

Proof. By definition, there exist functions $f, g$ such that

$x \in A ⟺ f (x) \in B y \in B ⟺ g (y) \in C$

This implies that $x \in A ⟺ g (f (x)) \in C$ . Note that both $f$ and $g$ are polynomial-time computable, so the function $g \circ f = g (f (\cdot))$ is computable in polynomial time. $□$

Since reducibility is efficient, it immediately tells us that if one of the problems is efficient, then the other is also efficient.

Theorem 4.2. If $A \leq_{p} B$ and $B \in P$ , then $A \in P$ .

NP-Completeness

NP-Completeness captures the ideas and goals we’ve been building so far: problems in NP that if we can solve, then we can solve any other problem in NP.

Definition ( $NP$ -Completeness). Let $L$ be a language. We say thta $L$ is $NP$ -complete if

The language is in $NP$ : $L \in NP$ ; and
The language is $NP$ -hard: $\forall \hat{L} \in NP$ , we have $\hat{L} \leq_{p} L$ .

Notice that there can be languages $L \in EXP$ such that $L$ is NP-hard, but this would not be NP-complete, unless $L \in NP$ or $NP = EXP$ (which we don’t know is true or not). NP-completeness captures the intuition that if we can use a language to efficiently verify every other language in NP, then this language itself should be efficiently verifiable (otherwise we just verify the other languages directly).

Unhelpful/Useless NP-Complete Language

We’ll now see an example of an NP-complete language which is not helpful for solving problems. This is because, as we’ll see, it is intimately tied to the Turing machine.

Denote by $TMS A T$ the language of all satisfiable Turing machines, defined as $TMS A T = {(α, x, 1^{n}, 1^{t}) : \exists w \in {0, 1}^{n} s.t. M_{α} (x, w) = 1 in at most t steps} .$ Here, $1^{n}$ and $1^{t}$ denote a string of $n$ (resp., $t$ ) 1’s. This is a syntactic convention we use to ensure that any machine deciding $TMS A T$ runs in time that is polynomial in $n$ and $t$ ; whereas if we specified $n$ and $t$ in binary, then the machine would only run in polynomial time with respect to the bit-length of these numbers.

Lemma 4.3. $TMS A T$ is NP-complete.

Proof. Clearly $TMS A T \in NP$ by definition. The NTM deciding $TMS A T$ takes the input $(α, x, 1^{n}, 1^{t})$ , guesses the string $w \in {0, 1}^{n}$ and runs $M_{α} (x, w)$ . If $M_{α}$ exceeds $t$ computational steps, output $0$ ; otherwise, output according to $M_{α} (x, w)$ ( $0$ if $0$ , $1$ if $1$ ).

We now show that $TMS A T$ is NP-hard. That is, for any $L \in NP$ , we show $L \leq_{p} TMS A T$ . To do so, we define a function $f$ satisfying: $x \in L ⟺ f (x) \in TMS A T$ . To being, let $p$ and $q$ be polynomials related to the Turing machine which verifies the language $L$ . That is, $p$ and $q$ correspond to the Turing machine $M_{L}$ which on any input $x$ and witness $w \in {0, 1}^{p (∣ x ∣)}$ runs in time at most $q (∣ x ∣ + ∣ w ∣)$ .

Now we define $f (x)$ as follows for any $x \in {0, 1}^{*}$ .
$f : x \mapsto (⟨ M_{L} ⟩_{2}, x, 1^{p (∣ x ∣)}, 1^{q (∣ x ∣ + p (∣ x ∣))}) .$ The tuple $f (x)$ is in $TMS A T$ if there exists $w \in {0, 1}^{p (∣ x ∣)}$ such that $M_{L} (x, w) = 1$ in at most $q (∣ x ∣ + p (∣ x ∣))$ steps. Notice that this is trivially true by definition of the NP language $L$ . Therefore we have $f (x) \in TMS A T ⟺ x \in L$ . $□$

This NP-complete language isn’t useful because it’s very definition makes it trivially NP-complete. Moreover, it is inherently tied to the definition of a Turing machine. Intuitively, this says that: if you can compute the Turing machine which verifies the language $L$ , then you can compute the Turing machine which verifies the language $L$ .

Ideally, we’d like a language that is NP-complete irrespective of the computational model we use. Intuitively, we want to show that the problem itself that is captured by the language is NP-complete, which would tell us that as long as we can solve this problem (and not the Turing machine tied to the problem), then we can solve other problems in NP.

Boolean Satisfiability

The problem we will examine as a candidate for NP-completeness in this light is Boolean Satisfiability. Recall the notion of Boolean variables or Boolean literals $x_{1}, \dots, x_{n}$ , which take on True/False values, where we use $1/0$ to denote these values, respectively. Similarly, recall Boolean operations: for example, $\lor$ (logical OR), $\land$ (logical AND), $\oplus$ (logical XOR), $\neg$ (logical NOT, denoted as $\overline{x} = \neg x$ ), etc. Then, a Boolean expression or Boolean formula is an expression involving Boolean variable and operations (e.g., $(\overline{x} \land y) \oplus (x \lor z)$ ). We define the length or size of a Boolean formula to be the number of non- $\neg$ operations in a formula.

For our purposes, we will only consider Boolean formulas which consist of AND, OR, and NOT. It is a well-known fact that these three operations are universal: any Boolean formula can be rewritten as an equivalent formula using only AND, OR, and NOT. Finally, we say that a Boolean formula $ϕ$ is satisfiable if there exists an assignment of the variable $x_{1}, \dots, x_{n}$ such that $ϕ (x_{1}, \dots, x_{n}) = 1$ .

Now, the language of Boolean Satisfiability is defined as follows. $S A T = {⟨ ϕ ⟩_{2} : ϕ is satisfiable} .$

How powerful is $S A T$ ? One measure of its power is the collapse of P vs. NP if we find a polynomial-time algorithm for deciding $S A T$ .

Theorem 4.4. $S A T \in P ⟺ P = NP$ .

Cook-Levin Theorem: SAT is NP-complete

In the 1970’s, Cook and Levin independently showed that $S A T$ is NP-complete. This means that if we can find a satisfying assignment for Boolean formulas, we can solve any problem in NP. We’ll begin proving this theorem, then wrap up the proof in the next lecture.

Theorem 4.5 (Cook-Levin). $S A T$ is NP-complete.

Proof. We must show that $S A T \in NP$ and that $S A T$ is NP-hard. The first task is straightforward. For the second task, at a high level, we must construct a polynomial-time reduction from any language $L$ to an instance of $S A T$ . This reduction must have the property that the $S A T$ instance is satisfiable if and only if membership in $L$ is true. Conceptually, we’ll construct a Boolean formula which encodes the correctness of the Turing machine deciding the language $L$ . At a high-level, this is a simple task, but the devil is in the details with this reduction.

To begin, we show that $S A T \in NP$ . We give a simple NTM deciding $S A T$ . Let $ϕ$ be a Boolean formula and suppose $ϕ$ has $n$ literals $x_{1}, \dots, x_{n}$ . Then, the machine $M_{S A T}$ on input $ϕ$ simply guesses a satisfying assignment for $x_{1}, \dots, x_{n}$ , checks if $ϕ$ evaluates to $1$ under this assignment, then accepts or rejects accordingly. Clearly, $M_{S A T}$ is a NTM which decides $ϕ$ , and the running time of $M_{S A T}$ is clearly polynomial in the length of $ϕ$ .

We now turn to showing that $S A T$ is NP-hard. Before doing this, we switch to the convention of single-tape non-deterministic Turing machines. That is, we’ll use the definition of NP languages where $L \in NP$ if and only if there is a single-tape NTM which decides $L$ in polynomial time. Since, like deterministic machines, many-tape NTMs are (polynomially equivalent to) single-tape NTMs, everything remains in NP.

The idea behind the reduction is the following. Let $L \in NP$ with single-tape NTM $M_{L}$ deciding $L$ , and consider any $w \in {0, 1}^{*}$ .¹ The reduction (i.e., the function $f$ ) will first map the execution of $M_{L} (u)$ to a table representing this execution. Then, the reduction will specify a Boolean formula that is satisfiabile if and only if this table representing the execution is correct and accepts the input $x$ ; otherwise the formula will be unsatisfiable.

Assume that on inputs of length $n$ , the machine $M_{L}$ runs in time $n^{k}$ for some constant $k$ (for convenience in the proof, we actually assume the runtime is $n^{k} - 3$ , but this is a minor detail). We’ll construct a table $T$ representing the computation of $M_{L} (u)$ of size $n^{k} \times n^{k}$ . Every row of the table has the following properties:

The start and end of every row is filled with a special symbol $# \neq \in Γ$ , where $Γ$ is the tape alphabet of $M_{L}$ . We’ll index the start of the row by $0$ .²
For every row $i$ , the cells between the start and end $#$ symbols contain the contents of $M_{L}$ ’s single tape, plus its current state $q \in Q$ . The current state $q$ is used to represent the current position of $M_{L}$ ’s single tape head.
- If $q$ is at position $j$ in the row for $1 \leq j \leq n^{k} - 1$ , then the tape head is reading from position $j + 1$ in the table (which corresponds to the tape head being above position $j$ on $M_{L}$ ’s tape (here, we start indexing $M_{L}$ ’s tape at $1$ ).
The first row of the table (row $0$ ) always has the starting configuration of $M_{L}$ . This corresponds to the tuple $(#, q_{0}, w, □, \dots, □, #)$ .

Since $M_{L}$ runs in time at most $n^{k} - 3$ , it can read/write to/from at most $n^{k} - 3$ cells on its work tape. This is exactly the number of slots in a row of table which are dedicated to the work tape configuration, plus 2 slots for $#$ , and one more slot for the current state.

NP-table

Our goal is to define a Boolean formula $ϕ$ capturing the correctness of the table representing $M_{L} (w)$ . To do this, we first set up the alphabet of the table. Let $C = Q \cup Γ \cup {#}$ . We call $C$ the table alphabet. We let $ce ll [i, j] \in C$ denote a cell of the table for all $i, j \in {0, 1, \dots, n^{k} - 1}$ .

For every cell $ce ll [i, j]$ and every $s \in C$ , we define a unique Boolean literal $x_{i, j, s}$ . This literal represents the statement “ $ce ll [i, j] = s$ ”. In particular, if $ce ll [i, j] = □$ , then we would set $x_{i, j, □} = 1$ , and if $ce ll [i, j] \neq = □$ , then we’d set $x_{i, j, □} = 0$ . The reverse is also true; the literal being $1$ means the cell contains that element from $C$ , and being $0$ means it does not.

Using these literals, we’ll now encode the correctness of the table for $M_{L} (w)$ into a Boolean formula $ϕ$ . This formula is going to be the conjunction (i.e., logical AND) of 4 sub-formulas: $ϕ = ϕ_{s t a r t} \land ϕ_{a cce pt} \land ϕ_{ce ll} \land ϕ_{m o v e} .$

The formula $ϕ_{s t a r t}$ is simple: it will represent the correct starting configuration of the machine. This is a straightforward AND of many literals, shown below: $ϕ_{s t a r t} = x_{0, 0, #} \land x_{0, 1, q_{0}} \land x_{0, 2, w_{1}} \land \dots \land x_{0, n + 1, w_{n}} \land x_{0, n + 2, □} \land \dots \land x_{0, n^{k} - 2, □} \land x_{0, n^{k} - 1, #} .$

Next, the formula $ϕ_{a cce pt}$ will check that the table is an accepting table. That is, it will check that there exists at least one accepting state $q_{a cce pt}$ somwhere in the table. Note that we do not care where this accepting state is, nor if there is also a rejecting state, located in the table; we will handle these consistency checks with $ϕ_{m o v e}$ . Since all we care about is there is at least one accepting state, we can simply take a large OR of all the cells, yielding: $ϕ_{a cce pt} = 1 \leq i, j \leq n^{k} - 2 ⋁ x_{i, j, q_{a cce pt}} .$

The formula $ϕ_{ce ll}$ is going to make sure that every cell of the table only contains a single element of $C$ . That is, we check to make sure that (1) every cell contains an element of $C$ , and (2) every cell only contains a single element of $C$ . For (1), we can check this with a simple OR. Let $0 \leq i, j \leq n^{k} - 1$ . Then we can check if $ce ll [i, j]$ contains an element of $s$ using the expression $s \in C ⋁ x_{i, j, s} .$ If this is true, we know that $ce ll [i, j] \in C$ .

Now we ensure that $ce ll [i, j]$ only contains a single value from $C$ . This is done by making sure that for all $s, t \in C$ such that $s \neq = t$ , the expression $\overline{x}_{i, j, s} \lor \overline{x}_{i, j, t}$ is true. This expression evaluates to false when $ce ll [i, j]$ contains both $s$ and $t$ . If it contains at most one of $s$ or $t$ (including neither of them), then this expression is satisfied. Then we check that this holds over all $s \neq = t$ . Thus, (2) is captured by the formula= $s, t \in C s \neq = t ⋀ (\overline{x}_{i, j, s} \lor \overline{x}_{i, j, t}) .$

Therefore, a single cell $ce ll [i, j]$ is valid if both (1) and (2) hold. We then check that this condition holds for all possible cells, yielding our final expression $ϕ_{ce ll} = 0 \leq i, j \leq n^{k} - 1 ⋀ (s \in C ⋁ x_{i, j, s}) \land s, t \in C s \neq = t ⋀ (\overline{x}_{i, j, s} \lor \overline{x}_{i, j, t}) .$

Finally, we turn to the formula $ϕ_{m o v e}$ . The goal of $ϕ_{m o v e}$ is to ensure that the table we’ve constructed is a correct execution of the Turing machine $M_{L}$ on input $w$ . Intuitively, this involves confirming that transitioning from configuration $i$ to $i + 1$ was valid (according to the transition function $δ$ of $M_{L}$ ); i.e., that row $i$ in the table is consistent with row $i + 1$ . Unfortunately, trying to cook up a small (i.e., polynoimal-sized) formula for checking row $i$ vs. $i + 1$ of the entire row seems to not be possible (e.g., this could take many logical ORs of some $n^{k}$ -sized sub-formulas). Fortunately, it is enough for us to look at small $2 \times 3$ windows of the table representing $M_{L} (w)$ ’s computation.

This is (one of the many) beautiful parts of the Cook-Levin theorem. Intuitively, this “looking at windows” to check consistency showcases how highly local Turing machine computations are. As we will see, we will be able to completely verify the entire computation of the Turing machine by scanning over all $2 \times 3$ windows in the given table.

For $0 \leq i \leq n^{k} - 1$ and $0 \leq j \leq n^{k} - 3$ , define $W_{i, j}$ as the $2 \times 3$ the following matrix with entries from $C$ : $W_{i, j} = [ce ll [i, j] ce ll [i + 1, j] ce ll [i, j + 1] ce ll [i + 1, j + 1] ce ll [i, j + 2] ce ll [i + 1, j + 2]]$ We say that window $W_{i, j}$ is legal if this window does not violate the actions of the transition function $δ$ .

Rather than be super formal with this definition (which does not help with intuition), we’ll see some examples of legal windows. First suppose that $a, b, c \in Γ$ and let $q_{1}, q_{2}$ be states of $M_{L}$ . Now suppose the transition function $δ$ is defined as follows (for this limited example):

$δ (q_{1}, a) = {(q_{1}, b, R)}$ ; i.e., while in state $q_{1}$ , if $a$ is read from under the tape head, write $b$ under the tape head, then move the tape head right and stay in state $q_{1}$ .
$δ (q_{1}, b) = {(q_{2}, c, L), (q_{2}, a, R)}$ ; i.e., while in state $q_{1}$ , if $b$ is read from under the tape head, non-deterministically choose whether to
- write $c$ under the tape head, move the tape head left, then change to state $q_{2}$ ; or
- write $a$ under the tape head, move the tape head right, then change to state $q_{2}$ .

With respect to this transition function, the following windows would be considered legal.

Legal Windows

In this figure, windows (a) and (b) are legal because the transition function specifies these are legal actions (recall that the tape head reads the symbol next to the state in the table). Now window (c) is legal because with $q_{1}$ appearing on the top right, then the symbol $b$ appearing in the bottom right, this was possible if the symbol $a$ were to the right of $q_{1}$ and then $q_{1}$ moved right (as specified by $δ$ ). Window (d) is legal because the top and bottom are identical, indicating that the tape head is nowhere near these positions and therefore could not have modified them. Also, it is legal for $#$ to be in the left column (they can also appear in the right column, but never in the center column). Window (e) is legal because state $q_{1}$ might have been to the immediate right of the top row, a $b$ may have been read, then the tape head may have moved left and transitioned to state $q_{2}$ , which is a valid transition under $δ$ . Finally, window (f) is legal because $q_{1}$ may be to the immediate left of the first row, read $b$ , wrote $c$ then moved left, which is valid under $δ$ .

Now with respect to this transition function, here are examples of illegal windows.

Illegal Windows

In the above figure, window (a) is illegal since the tape head was not in a position to change $b$ to $a$ . Window (b) is illegal since the while in state $q_{1}$ and reading a $b$ , the transition function does not allow the machine to write $a$ then move left and change to state $q_{2}$ . Window (c) is illegal because there are two states specified in the bottom row.

Now, intuitively, we want to specify $ϕ_{m o v e}$ as the formula $ϕ_{m o v e} = 0 \leq i < n^{k} - 1 0 \leq j < n^{k} - 2 ⋀ [W_{i, j} is legal] .$ This says that all possible windows are legal. In the next lecture, we’ll see that this is enough to show the entire Turing machine computation is valid.

We switch to the variable $w$ here to not conflict with using $x$ for the literals of the Boolean formula. In class, I used $u$ but the pictures in this section use $w$ , so I am re-writing with $w$ to keep things consistent. ↩
This is slightly different from what was presented in class to make things convenient. ↩

Lecture 5

In-class notes: CS 505 Spring 2025 Lecture 5

Cook-Levin Theorem Wrap-Up

Recall from last time, we are trying to prove that $S A T$ is $NP$ -complete. To do so, we considered the single-tape non-deterministic Turing machine definition of $NP$ . Our goal is to show that for any $NP$ language $L$ , we have $L \leq_{p} S A T$ . That is, $L$ is poly-time reducible to $S A T$ .

From last time, we were able to construct an $n^{k} \times n^{k}$ table which encoded the execution of an NTM deciding $L$ on input $x$ . From this table, we constructed the Boolean formula

$ϕ = ϕ_{s t a r t} \land ϕ_{a cce pt} \land ϕ_{ce ll} \land ϕ_{m o v e} .$

The last thing to show is that our definition of $ϕ_{m o v e}$ correctly captures the correctness of the NTM $M$ deciding $L$ . Recall that $ϕ_{m o v e}$ was defined with respect to $2 \times 3$ windows in the table, and it tried to capture the notion of a legal window. That is, $ϕ_{m o v e} = 0 \leq i < n^{k} - 1 0 \leq j < n^{k} - 2 ⋀ [Window W_{i, j} is legal] .$

Claim. If the table has a correct starting configuration, and all $2 \times 3$ windows are legal, then $ro w [i + 1]$ is a correct transition from $ro w [i]$ for all $0 \leq i < n^{k}$ .

Proof. To prove the claim, first consider any such $i$ . Let $ro w [i]$ and $ro w [i + 1]$ be the $i$ and $i + 1$ rows of the table. Call $ro w [i]$ the upper configuration and $ro w [i + 1]$ the lower configuration.

Consider all windows $W_{i, j}$ for $0 \leq j < n^{k} - 2$ . That is, we look at all windows in the upper and lower configuration. We now define when window $W_{i, j}$ is legal. Legal windows fall into two categories: windows which contain a state and those which do not.

No state in the window. Suppose window $W_{i, j}$ contains no state. Then we say that $W_{i, j}$ is legal if and only if the two elements in the center column are equal. The window below is an example. $W_{i, j} = [a c b b c c]$ Note that even though in the above example, the first column has $a$ then $c$ , this would be a legal window because it is possible the tape head is just to the left of $a$ in the upper configuration, writes $c$ over $a$ , then moves left.
State in the window. Suppose that window $W_{i, j}$ contains a state. Then window $W_{i, j}$ is legal if and only if the upper and lower configuration in this window is consistent with the transition function of the Turing machine. In particular, by our construction of the table and since the NTM is a single-tape NTM, a state in the window represents the current position of the tape head. First, we know that when transitioning from the upper configuration to the lower configuration, the state can move at most one position (left, right, or stay). This is easy to check for. Then we know that in the table, the tape head only touches the cell immediately to its right. That is, if the state is in $ce ll [i, j]$ , then the tape head is reading from/writing to $ce ll [i, j + 1]$ . In a nutshell, the computation of a Turing machine is highly local: it can’t jump large distances in a single time-step. Examples of legal windows are given below. $[a a q_{1} q_{2} b b] [q_{1} c a q_{2} b b] [q_{1} d b c c c]$
Special windows. There are two special windows in any pair of upper and lower configurations: $W_{i, 0}$ and $W_{i, n^{k} - 3}$ . These represent the edges of the table. These windows are legal if and only if: (1) they satisfy both of the above constraints; and (2) they have the fixed $#$ symbol on the edges. See the examples below.

$W_{i, 0} = [# # q_{1} a a q_{2}] W_{i, n^{k} - 3} = [c q_{7} d d # #]$

By the above notion of legal windows, if all windows in the upper and lower configuration are legal, then it represents a correct transition to $ro w [i + 1]$ from $ro w [i]$ . Inductively, this means that if we start with a correct starting configuration, and every window in the table is legal, then each pair of upper and lower configurations represents a valid transition from $ro w [i]$ to $ro w [i + 1]$ , and hence the table correctly captures the computation of the decider $M$ for language $L$ .

We conclude by giving the Boolean formula for $ϕ_{m o v e}$ . To do so, we simply need to give a Boolean formula for the statement “ $W_{i, j} is a legal window$ .” Define the set $S$ as follows. $S = {(a_{1}, a_{2}, a_{3}, a_{4}, a_{5}, a_{6}) : a_{i} \in C \forall i \in [6] \land [a_{1} a_{4} a_{2} a_{5} a_{3} a_{6}] is a legal window} .$ Here, recall that $C$ is the cell alphabet of our table.

Given this set $S$ , the Boolean formula for the statement “ $W_{i, j} is a legal window$ ” is expressed as $(a_{1}, \dots, a_{6}) \in S ⋁ (x_{i, j, a_{1}} \land x_{i, j + 1, a_{2}} \land x_{i, j + 2, a_{3}} \land x_{i + 1, j, a_{4}} \land x_{i + 1, j + 1, a_{5}} \land x_{i + 1, j + 2, a_{6}})$ What is this formula saying? It says that given a tuple $(a_{1}, \dots, a_{6})$ , which I know by the definition of $S$ represents some legal window, is the current window $W_{i, j}$ this legal window? That is, it is asking if $W_{i, j} = ? [a_{1} a_{4} a_{2} a_{5} a_{3} a_{6}]$ is true. We take a big OR over all legal windows to make sure that window $W_{i, j}$ is some legal window.

If this big OR is true, then we know $W_{i, j}$ is some legal window. This gives us the final expression for $ϕ_{m o v e}$ as $ϕ_{m o v e} = 0 \leq i < n^{k} - 1 0 \leq j < n^{k} - 3 ⋀ (a_{1}, \dots, a_{6}) \in S ⋁ (x_{i, j, a_{1}} \land x_{i, j + 1, a_{2}} \land x_{i, j + 2, a_{3}} \land x_{i + 1, j, a_{4}} \land x_{i + 1, j + 1, a_{5}} \land x_{i + 1, j + 2, a_{6}}) .$ All together, we have that $ϕ = ϕ_{s t a r t} \land ϕ_{a cce pt} \land ϕ_{ce ll} \land ϕ_{m o v e}$ is satisfiable if and only if the NTM we are encoding in the table is accepting.

The final piece of the puzzle is arguing that we can construct $ϕ$ in polynomial time. Note that the cell alphabet $C = Γ \cup Q \cup {#}$ is of constant size with respect to the input length by definition of Turing machines.

For $ϕ_{s t a r t}$ , given an input $x$ to the NTM $M$ for deciding the language $L$ , the starting configuration of the machine is fixed. Thus, the starting row of the table is fixed as well. For $n = ∣ x ∣$ , the starting row of the table contains $n^{k}$ cells, which corresponds to $n^{k}$ literals in $ϕ_{s t a r t}$ . This can clearly be constructed in $O (n^{k})$ time.
For $ϕ_{a cce pt}$ , recall that we are simply scanning the entire $n^{k} \times n^{k}$ table for an accepting state. The table has total size $n^{2 k}$ , so this formula clearly has size $n^{2 k}$ and can be constructed in $O (n^{2 k})$ time.
For $ϕ_{ce ll}$ , it is a big AND of $n^{2 k}$ pairs $i, j$ . Within this big AND, we have two constant sized subformulas. First, the formula checking that cell $i, j$ contains a valid symbol $s \in C$ . Since $∣ C ∣$ is constant, the size of this formula is constant. Then this subformula is AND’d with a big AND of an OR which checks that cell $i, j$ doesn’t contain both symbol $s \in C$ and $t \in C$ . Again, since $∣ C ∣$ is constant, this subformula is constant. So the total size of $ϕ_{ce ll}$ is $O (n^{2 k})$ and can be constructed in this much time as well.
Similarly, for $ϕ_{m o v e}$ , the size of the set $S$ is at most $∣ C ∣! / (∣ C ∣ - 6)!$ , which is constant size (something like $O (∣ C ∣^{6})$ ) since $∣ C ∣$ and $6$ are constants. So the inner formula is a constant size, whereas the whole formula $ϕ_{m o v e}$ is a big AND of pairs $i, j$ from $0$ to at most $n^{k} - 1$ . So $ϕ_{m o v e}$ has size $O (n^{2 k})$ and can be constructed in this much time.

This completes the proof of the Cook-Levin theorem. $□$

Lecture 6

In-class notes: CS 505 Spring 2025 Lecture 6

Brief Aside on Reductions

When we say that a language $A$ is polynomial-time reducible to a language $B$ , denoted as $A \leq_{p} B$ , we are saying the following.

Given ANY string $x$ , I can transform $x$ into SOME $f (x)$ in polynomial time such that $x \in A ⟺ f (x) \in B .$

Looking back at our proof that $I N D SE T_{k}$ is NP-complete, we showed how to transform ANY 3CNF formula $ϕ$ into SOME PARTICULAR graph $G$ such that $ϕ$ was satisfiable if and only if $G$ has an independent set of size $k$ . This is NOT saying that given ANY graph $G$ , I can solve ANY 3CNF formula.

Search vs. Decision Problems

We’ve stated complexity classes as decision problems, but one may naturally consider search variants where we are asked to find solutions. For the class $P$ , these two notions are the same: if I can decide a problem, then I can search for an actual solution (and vice versa).

If $P \neq = NP$ , then we cannot efficiently search for solutions (i.e, certificates) for decision problems in $NP$ . On the other hand, if $P = NP$ , then we can.

Theorem. If $P = NP$ , then for every $L \in NP$ with efficient deterministic verifier $M_{L}$ , there exists a polynomial-time deterministic machine $M_{L}$ such that for all $x \in L$ , $M_{L} (x) = w$ and $M_{L} (x, w) = 1$ .

Proof. We show this is true for SAT, which implies it holds for all of NP since SAT is NP-complete. Suppose that $A$ is a decider for $S A T$ . That is, if $ϕ$ is a Boolean formula, then $A (ϕ) = 1$ if and only if $ϕ$ is satisfiable. Since $P = NP$ , we know that $A$ is deterministic and runs in polynomial time.

Now we build a new machine $B$ which outputs a satisfying assignment for $ϕ$ if $A (ϕ) = 1$ . Suppose that $ϕ$ has $ℓ$ literals $x_{1}, \dots, x_{ℓ}$ . The algorithm $B$ operate as follows.

On input $ϕ$ , the machine $B$ :
- Runs $A (ϕ)$ .
- If $A (ϕ) = 0$ , output reject.
- Else if $A (ϕ) = 1$ , we know $ϕ$ has a satisfying assignment.
- Set $ϕ = ϕ$ and set $u = ‘‘"$ (the empty string).
- For $i = {1, 2, \dots, ℓ}$ :
  - Let $ψ_{0} = ϕ (x_{i} = 0)$ and $ψ_{1} = ϕ (x_{i} = 1)$ be the Boolean formulas obtained by setting literal $x_{i}$ in $ϕ$ to $0$ and $1$ , respectively.
  - Compute $b_{0} = A (ψ_{0})$ and $b_{1} = A (ψ_{1})$ .
  - If $b_{0} = 1$ then set $ϕ = ψ_{0}$ and $u = u ∥0$ .
  - Else if $b_{1} = 1$ then set $ϕ = ψ_{1}$ and $u = u ∥1$ .
- Output $u$ .

Clearly, since $A$ runs in polynomial time, then $B$ runs in polynomial time (in the length of $ϕ$ ). By our construction, if $ϕ$ is satisfiable, then the machine $B$ correctly reconstructs a satisfying assignment. In particular, during every step of the for loop, at least one of $b_{0}$ or $b_{1}$ will be equal to $1$ . If not, then the original formula $ϕ$ would not be satisfiable, which means we would have already rejected. $□$

The Complexity Class coNP

We’ll now discuss the co-class to NP, which we call coNP. This is defined by the set of languages with complements in NP. That is, $coNP = {L : \overline{L} \in NP} .$

Warning

$coNP \neq = \overline{NP}!$

In fact, we know that $NP \cap coNP \neq = \emptyset$ .

Theorem. $P \subset NP \cap coNP$ .

At a high level, this theorem follows since if $L \in P$ , then $\overline{L} \in P$ .

Alternative view of coNP

The above definition of coNP isn’t very useful for understanding what languages in the class look like. So we consider the following equivalent definition.

Definition ( $coNP$ ). We say a language $L \in coNP$ if there exists a polynomial $p$ and a polynomial-time deterministic Turing machine $M$ such that $x \in L ⟺ \forall w \in {0, 1}^{p (∣ x ∣)} : M (x, w) = 1.$

Notice this is the opposite of NP! In the similar definition of NP, we have a “there exists” ( $\exists$ ) rather than the “for all” ( $\forall$ ) in the above definition.

Theorem. If $P = NP$ , then $NP = coNP$ . Or, equivalently stated, if $NP \neq = coNP$ , then $P \neq = NP$ .

In general, we do not believe that $NP = coNP$ .

coNP-complete Problems

Just like with the NP, we can equivalently define coNP-complete problems. A language $L$ is coNP-complete if $L \in coNP$ and $L^{'} \leq_{p} L$ for all $L^{'} \in coNP$ .

We’ll look at the following problem: deciding if a formula $ϕ$ is a tautology. That is, every assignment of variables in $ϕ$ is a satisfying assignment.

$T A U T = {ϕ : ϕ is a tautology} .$

Theorem. $T A U T$ is coNP-complete.

Proof. Clearly $T A U T \in coNP$ since we can build a machine $M$ such that for any $ϕ$ , and any assignment of variables $w$ , $M (ϕ, w) = 1$ if and only if $ϕ$ is satisfied. By definition of coNP, this machine must output $1$ for all assignments $w$ , which is true if $ϕ$ is a tautology.

Now we show that $T A U T$ is coNP-hard. Let $L \in coNP$ be any language. Consider $\overline{L} \in NP$ , its complement language that is in NP (which follows by definition of coNP).

By the Cook-Levin theorem, let $ϕ_{x}$ be the formula such that $x \in \overline{L} ⟺ ϕ_{x}$ is satisfiable, for any $x \in {0, 1}^{*}$ . Then, we know $x \in \overline{L} ⟺ ϕ_{x} is satisfaible x \in / \overline{L} ⟺ ϕ_{x} is not satisfaible x \in L ⟺ ϕ_{x} is not satisfaible x \in L ⟺ \overline{ϕ_{x}} is a tautology$ Thus, every $L \in coNP$ is polynomial-time reducible to $T A U T$ . $□$

The Complexity Class NEXP

Recall the definition of EXP: $EXP = c ⋃ DTIME (2^{n^{c}}) .$

We can equivalently define the class of languages decidable in non-deterministic exponential time: NEXP.

$NEXP = c ⋃ NTIME (2^{n^{c}}) .$

Given this, we have a DTIME/NTIME hierarchy of classes:

$P \subset NP \subset EXP \subset NEXP .$

Perhaps surprisingly, we get a “collapse” if $P = NP$ .

Theorem. If $P = NP$ , then $EXP = NEXP$ . Or, equivalently stated, if $EXP \neq = NEXP$ , then $P \neq = NP$ .

Note this is the opposite of a $k$ -independent set. ↩

Lecture 7

In-class notes: CS 505 Spring 2025 Lecture 7

Diagonalization

Suppose we are given complexity classes $C_{1}$ and $C_{2}$ . How can we show they are different? That is, show $C_{1} \neq = C_{2}$ .

We’ve seen the technique of diagonalization before when we showed undecidability of certain languages. Diagonalization is a general technique that gives us one way of showing the above result: differentiating between complexity classes. Intuitively, if we are given $L_{1} \in C_{1}$ and $L_{2} \in C_{2}$ , diagonalization allows us to differentiate between $C_{1}$ and $C_{2}$ as follows.

If $M_{1}$ decides $L_{1}$ , then we want to say $M_{1}$ is different from any decider $M_{2}$ for $L_{2}$ .
We do this by arguing for any $x$ , if $M_{2} (x) = 1$ then $M_{1} (x) = 0$ , and vice versa.

Origins of Diagonalization

Diagonalization was originally introduced by Georg Cantor. He used diagonalization to prove that $∣ N ∣ < ∣ R ∣$ . That is, the set of all natural numbers is strictly smaller than the set of all real numbers. This result at the time was not well received: these are both infinite sets, how could you possibly reason about them being different sizes?

This proof first relies on defining when two infinite-sized sets are the same size. Briefly, two sets of infinite size $S_{1}, S_{2}$ are said to have the same size $∣ S_{1} ∣ = ∣ S_{2} ∣$ if there exists a bijection $f$ from $S_{1}$ to $S_{2}$ . That is, for every $s_{1} \in S_{1}$ , there is a unique $s_{2} \in S_{2}$ such that $f (s_{1}) = s_{2}$ .

For example, we know that $∣ N ∣ = ∣ Z ∣$ via the following bijection. $0 \mapsto 01 \mapsto 12 \mapsto - 13 \mapsto 24 \mapsto - 2 \dots f (i) = {- ⌈ i /2 ⌉ ⌈ i /2 ⌉ i is even i is odd .$

Under this definition of set equality, Cantor showed that $∣ N ∣ < ∣ R ∣$ . We’ll do an easier proof by showing $N$ is smaller than the entire interval of real numbers $[0, 1]$ .

Theorem. $∣ N ∣ < ∣ [0, 1] ∣$ .

Proof. We do a proof by contradiction. Suppose that $∣ N ∣ = ∣ [0, 1] ∣$ . This means there is a bijection from $N$ to $[0, 1]$ . We can write the bijection as an infinite table.

$N$	$[0, 1]$
$0$	$0.12345\dots$
$1$	$0.14159\dots$
$2$	$0.23333\dots$
$3$	$0.10000\dots$
$4$	$0.67321\dots$

From this table, we’ll construct a new real number $r \in [0, 1]$ that is not in the above bijection. We construct the real number $r = 0. r_{0} r_{1} r_{2} \dots$ digit by digit. We’ll index digits in the right column of the table starting with $0$ (i.e., the first digit after the decimal point is the 0-th digit). Let $f$ denote the bijection described by the table.

Then, for each $i \in N$ , we define $r_{i} = d$ for any $d \in {0, 1, \dots, 9}$ such that $d \neq = f (i)_{i}$ . That is, the $i$ -th digit of $r$ will be explicit different from the $i$ -th digit of the real number $f (i)$ . As a picture, we look at the digits in the table on the diagonal.

$N$	$[0, 1]$
$0$	$0. [[1]] 2345\dots$
$1$	$0.1 [[4]] 159\dots$
$2$	$0.23 [[3]] 33\dots$
$3$	$0.100 [[0]] 0\dots$
$4$	$0.6732 [[1]] \dots$

Taking the positions on the diagonal, we construct the new real number $r = 0.27619 \dots$ . Now, there does not exist any $i \in N$ such that $f (i) = r$ . This is because for every $i$ , we have $r_{i} \neq = f (i)_{i}$ , which implies that $r \neq = f (i)$ . Thus, the mapping $f$ cannot exist. $□$

Time Hierarchies

With diagonalization fleshed out more, we can now discuss time hierarchies.

Deterministic Time Hierarchy

First, we’ll show a time hierarchy theorem for deterministic computations.

Theorem. Let $f$ and $g$ be time constructible functions such that $f = o (g / lo g (g))$ . Then $DTIME (f) ⊊ DTIME (g)$ .

As a corollary of the above theorem, we have:

Corollary. There exists a language $L$ decidable in time $f$ but not decidable in time $o (f / lo g (f))$ for any time constructible function $f$ .

We now prove the time hierarchy theorem.

Proof. As you might expect from the preceding discussion, we’ll have a diagonalization proof. First, we build a deterministic Turing machine $D$ as follows.

For any $x \in {0, 1}^{n}$ and for any $n \in N$ , $D (x)$ does the following.

Compute $k = ⌈ g (n) / lo g (g (n))⌉$ .
Simulate $M_{x} (x)$ .
If $M_{x} (x)$ halts within $k$ steps, output $1 - M_{x} (x)$ .
Else output $0$ .

Note that step (1) can be done in $O (g (n))$ time since $g$ is time constructible. If $M_{x}$ runs in time $T$ , then step (2) runs in time $O (T (n) lo g (T (n)))$ since we can do universal simulation with only logarithmic overhead.

Now, let $A = L (D)$ denote the language of $D$ ; i.e., $A = {x ∣ D (x) = 1}$ . By definition, $D$ decides $A .$ Moreover, $A \in DTIME (g)$ since $D (x)$ only simulates $M_{x}$ for at most $O (g (∣ x ∣) / lo g (g (∣ x ∣)))$ steps.

Claim. $A \in / DTIME (f)$ .

This is where our diagonalization comes into play. Suppose this claim is not true. This implies $A \in DTIME (f)$ , and there is a decider $M_{A}$ deciding $A$ in time $O (f)$ .

Now, consider running $D (⟨ M_{A} ⟩)$ . Suppose that $n = ∣ ⟨ M_{A} ⟩ ∣$ . Then, $D$ simulates $M_{A} (⟨ M_{A} ⟩)$ for at most $k = O (g (n) / lo g (g (n))$ steps. Notice that $M_{A}$ runs in time $f$ on any input. In particular, $M_{A} (⟨ M_{A} ⟩)$ runs in time $O (f (n))$ . By universal simulation, we know that $D$ simulates $M_{A} (⟨ M_{A} ⟩)$ in time $O (f (n) lo g (f (n)))$ . Since $f = o (g / lo g (g))$ we have that $M_{A} (⟨ M_{A} ⟩)$ halts in at most $f (n) = o (g (n) / lo g (g (n)))$ steps. So, for large enough $n$ ,¹ $M_{A}$ runs for less than $k = ⌈ g (n) / lo g (g (n))⌉$ steps. Moreover, by universal simulation, $D$ still runs in at most $O (g (n))$ steps on this input.

This implies that $D (⟨ M_{A} ⟩)$ completes the simulation of $M_{A} (⟨ M_{A} ⟩)$ and outputs $1 - M_{A} (⟨ M_{A} ⟩)$ . However, this implies that $D (⟨ M_{A} ⟩) \neq = M_{A} (⟨ M_{A} ⟩)$ . We assumed that $M_{A}$ decides the language $A$ , which $D$ also decides by definition, but these two machines differ on this input. This is a contradiction, so $M_{A}$ does not exist. $□$

We have two important corollaries from the time hierarchy theorem.

Corollary. For all $1 \leq ε_{1} < ε_{2}$ , we have $DTIME (n^{ε_{1}}) ⊊ DTIME (n^{ε_{2}})$ .

Corollary. $P ⊊ EXP$ .

Non-deterministic Time Hierarchy

Now, we move on to show the non-deterministic time hierarchy theorem.

Theorem. Let $f, g$ be time constructible functions such that $f (n + 1) = o (g (n))$ . Then, $NTIME (f) ⊊ NTIME (g)$ .

Proof. Unfortunately, we cannot do a standard diagonalization here. With the deterministic time hierarchy, we simulated the machine (which shouldn’t have existed), and were able to flip its output. The simulation was deterministic and always output the opposite of the machine $M_{A}$ . However, with non-deterministic simulation, there could be exponentially many outputs on a single input. Recall that a non-deterministic decider needs to only output accept on at least one computation path, and reject on all (when rejecting a string).

The idea behind the non-deterministic simulation will be to do a lazy simulation. In particular, our diagonalization will only differ on a single output; for all other outputs, we will output the correct bit (of the machine we are simulating). This will be enough to derive our contradiction.

We proceed with the proof. Let $i \in N$ and let $M_{i}$ denote the machine described by $⟨ i ⟩_{2}$ . Now let $f$ be a function such that $f (0) = 1$ , $f (1) = 2$ , and $f (i + 1) = 2^{f (i)^{2}}$ . Let $g$ be any function such that $f (n + 1) = o (g (n))$ .

Build a non-deterministic Turing machine $D$ which does the following. $D$ takes as inputs strings of the form $1^{n}$ for any $n \in N$ , where $1^{n}$ denotes the string of $n$ 1’s.

$D (1^{n}) :$

Compute $i$ such that $f (i) < n \leq f (i + 1)$ .
If $f (i) < n < f (i + 1)$ :
1. Non-deterministically simulate $M_{i} (1^{n + 1})$ for at most $g (n)$ steps.
2. If $M_{i}$ halts within $g (n)$ steps, output $M_{i} (1^{n + 1})$ .
3. Else output $1$ .
If $n = f (i + 1)$ :
1. Deterministically simulate $M_{i} (1^{f (i) + 1})$ by trying all computation paths.
2. Output $1 - M_{i} (1^{f (i) + 1})$ .

Now, we argue that $D (1^{n})$ runs in time $O (g (n))$ . First, step (1) takes at most $O (f (n)) = o (g (n))$ time. Second, all of step (2) only takes $O (g (n))$ time. Third, step (3.1) takes at most $O (2^{f (i) + 1}) = O (f (i + 1))$ time, which overall takes $O (g (n))$ time. Therefore, $D (1^{n})$ runs in time $O (g (n))$ .

Let $A = L (D)$ . By the above discussion, we know that $A \in NTIME (g)$ .

Claim. $A \in / NTIME (f)$ .

Again, suppose this is not the case. Then there is an NTM $M_{A}$ which decides $A$ in at most $O (f)$ time. Let $n = ∣ ⟨ M_{A} ⟩ ∣$ be large enough such that for $i$ satisfying $f (i) < n \leq f (i + 1)$ , we have $M_{i} = M_{A}$ .

Now, run $D (1^{n})$ . If $n < f (i + 1)$ , then $D$ simulates $M_{i} (1^{n + 1})$ for at most $g (n)$ steps. By construction, $M_{i} = M_{A}$ , and $M_{A} (1^{n + 1})$ runs in non-deterministic time $f (n + 1) = o (g (n))$ . So the simulation halts before $g (n)$ steps and $D (1^{n}) = M_{A} (1^{n + 1})$ .

This implies the following equalities. $D (1^{f (i) + 1}) = M_{A} (1^{f (i) + 2}) D (1^{f (i) + 2}) = M_{A} (1^{f (i) + 3}) D (1^{f (i) + 3}) = M_{A} (1^{f (i) + 4}) ⋮ D (1^{f (i + 1) - 1}) = M_{A} (1^{f (i + 1)})$

Moreover, by assumption, $M_{A}$ and $D$ both decide the same language $A$ . This implies for all $j \in {f (i) + 1, f (i) + 2, \dots, f (i + 1)}$ , we have $D (1^{j}) = M_{A} (1^{j}) .$ This actually shows that $D (1^{j}) = D (1^{j + 1})$ for all $j \in {f (i) + 1, f (i) + 2, \dots, f (i + 1) - 1}$ ; similarly, the same is true for $M_{A}$ : $M_{A} (1^{j}) = M_{A} (1^{j + 1})$ for all $j$ .

Now suppose that $n = f (i + 1)$ . By construction, $D (1^{n})$ now simulates $M_{A} (1^{f (i) + 1})$ deterministically and outputs $1 - M_{A} (1^{f (i) + 1})$ . Here, $M_{A}$ outputs $1$ if there exists an accepting path, and outputs $0$ otherwise. This implies that $D (1^{f (i + 1)}) \neq = M_{A} (1^{f (i) + 1})$ . But this is a contradiction since above we established that $D (1^{j}) = M_{A} (1^{j})$ for all $j \in {f (i) + 1, f (i) + 2, \dots, f (i + 1)}$ . Thus, $M_{A}$ cannot exist. $□$

Recall that every Turing machine has an infinite number of equivalent strings which describe said machine. ↩

Lecture 8

In-class notes: CS 505 Spring 2025 Lecture 8

NP-Intermediate Languages

So far, we’ve looked at many NP problems that also happen to be NP-complete. Thus, it is a natural question to ask whether all languages in NP are NP-complete. It turns out, under the widely believed conjecture that $P \neq = NP$ .

Theorem. If $P \neq = NP$ , then there exists $L \in NP ∖ P$ such that $L$ is not NP-complete.

In other words: if all languages $L \in NP ∖ P$ are NP-complete, then $P = NP$ .

Example

Two examples of languages we believe to not be NP complete are factoring and graph isomorphism. Factoring asks if an integer $N$ has prime factors $p_{1}, p_{2}, \dots, p_{k}$ , and graph isomorphism asks if two graphs are isomorphic. This means that there exists a permutation $π$ such that for two graphs $G_{1}, G_{2}$ , if $(u, v) \in E (G_{1})$ , then $(π (u), π (v)) \in E (G_{2})$ .

Oracle Machines

When we showed $DTIME (f) ⊊ DTIME (g)$ , we utilized diagonalization. In the proof, we had a decider $D$ for some language $L \in DTIME (g)$ , and by contradiciton, we assumed we had a machine $M_{A}$ which decided $L$ in time $O (f)$ . In the machine $D$ , we received $M_{A}$ as input and simulated $M_{A}$ it.

At a high-level, diagonalization is possible because of two key properties.

Turing machines always have efficient representations as strings.
The universal simulation of any Turing machine given its efficient representation as a bit string does not examine the inner workings of the machine.

In the machines $D$ and $M_{A}$ above, the machine $D$ simulates $M_{A}$ obliviously: it does not even need to look at what $M_{A}$ is doing, it does not care about the internal mechanisms of $M_{A}$ . Thus, $D$ is treating $M_{A}$ as a black-box: it gives $M_{A}$ some input and $M_{A}$ gives $D$ some output.

Oracle Turing Machines

We can abstract these two properties and define oracle Turing machines. These are special Turing machines with an additional oracle tape, and access to some oracle $O$ . We denote this as $M^{O}$ . The machine $M^{O}$ can query $O (x)$ for any input $x$ in a single computation step, and $O (x)$ writes the output to the special oracle tape in this single computation step.

For a language $L$ , we let $M^{L}$ denote an oracle Turing machine with an oracle to a decider for the language $L$ . This allows us to define oracle complexity classes.

$P^{L}$ is the set of all languages decidable in deterministic polynomial time relative to the oracle/language $L$ .
$NP^{L}$ is the set of all languages decidable in non-deterministic polynomial time relative to the oracle/language $L$ .

As a concrete example, $P^{SAT}$ is the set of all languages decidable by a deterministic polynomial-time oracle Turing machine with oracle access to a decider for SAT. That is, $SAT (ϕ) = 1$ if and only if $ϕ$ is a satisfiable formula. Recall the complement language of SAT: $\overline{SAT}$ is the set of all unsatisfiable formula $ϕ$ .

Lemma. $\overline{SAT} \in P^{SAT}$ .

Proof. We build a deterministic polynomial time Turing machine $M$ that is given oracle access to $SAT$ such that $M^{SAT} (ϕ) = 1$ if and only if $ϕ$ is not satisfiable. The machine $M^{SAT}$ is simple. On input $ϕ$ , $M$ queries $SAT (ϕ)$ and outputs the opposite answer. Since $SAT (ϕ) = 1$ if and only if $ϕ$ is satisfiable, clearly $M^{SAT}$ decides $ϕ$ . Moreover, this is polynomial time. $□$

Actually, we can show a stronger result.

Lemma. $\overline{SAT} \in P^{L}$ for any NP-complete language $L$ .

At a high-level, given a formula $ϕ$ as input, we simply perform a polynomial-time reduction from $ϕ$ to the language $L$ . For example, we can reduce $ϕ$ to a $3 SAT$ instance, then reduce the $3 SAT$ instance to $L$ (or simply do a direct reduction).

As another result, we know that oracles in $P$ do not grant us more power for languages in $P$ .

Theorem. For any $L \in P$ , we have $P^{L} = P$ .

Proof. $P \subseteq P^{L}$ is immediate since every language in $P$ is decidable in polynomial time without an oracle. For $P^{L} \subseteq P$ , since $L \in P$ , we can convert any polynomial time Turing machine with oracle access to $L$ to another Turing machine which decides the same language but simply simulates $L$ . This simulation is polynomial time, so the final time is polynomial. $□$

Note there are powerful oracles for which $P$ and $NP$ are equal relative to this oracle. One such oracle is for the language $E = EXPCOM$ , which we define as the set of all tuples $(M, x, 1^{n})$ where $M (x) = 1$ within $2^{n}$ steps.

Lemma. $P^{E} = NP^{E} = EXP$ .

Proof. First, $EXP \subseteq P^{E}$ is immediate since any machine with oracle access to $E$ can check if an exponential time Turing machine $M$ outputs $M (x) = 1$ in at most $2^{n}$ steps. So an exponential time computation can be done in constant time.

Second, $P^{E} \subseteq NP^{E}$ is trivially true since $P \subseteq NP$ .

Third, we show that $NP^{E} \subseteq EXP$ . Suppose that $L \in NP^{E}$ and $L$ is decidable on NTM $M^{E}$ in time $n$ . We construct machine $D$ that decides $L$ in at most $O (2^{n})$ time. $D$ is given the description of $M$ and deterministically simulates $M$ on any input $x$ by simulating all possible computation paths. $D$ outputs accept if and only if there is at least one accepting computation path, and outputs reject otherwise. Since $M$ runs in non-deterministic time $n$ , this simulation takes at most $O (2^{n})$ time. Finally, whenever $M$ calls the oracle $E$ on input $(M, y, 1^{i})$ for some $i \leq n$ , the machine $D$ simply runs the machine $M (y)$ for at most $2^{i}$ steps and returns the result.

Limits of Diagonalization

Oracle machines help us quantify the limits of diagonalization. Diagonalization is quite a powerful, and general technique, so naturally we’d like to resolve $P$ vs. $NP$ using it. Unfortunately, this is impossible.

Theorem. There exists oracle $O_{1}$ and $O_{2}$ such that

$P^{O_{1}} = NP^{O_{1}}$ ; and
$P^{O_{2}} \neq = NP^{O_{2}}$ .

Proof. Setting $O_{1} = E$ (the $EXPCOM$ oracle), we have (1) of the theorem by our previous lemma.

Now we construct an oracle $O_{2}$ to prove (2). Interestingly enough, we will use diagonalization to construct this oracle. First, let $L \subseteq {0, 1}^{*}$ be any language. Define a new language $U_{L}$ as $U_{L} = {1^{n} ∣ \exists x \in L s.t. ∣ x ∣ = n} .$

Notice that $U_{L} \in NP^{L}$ for any language $L$ . To see this, define machine $M^{L} (1^{n})$ which (1) guesses $x \in {0, 1}^{n}$ ; (2) outputs oracle query $L (x)$ . By definition, $L (x) = 1$ if and only if $x \in L$ ; otherwise it outputs $0$ . Clearly, this non-deterministic machine decides $U_{L}$ .

Now, we construct a new language $B$ such that $U_{B} \in / P^{B}$ . Then, we will set $O_{2} = B$ , completing the proof. We’ll define the language $B$ inductively, first by setting $B = \emptyset$ . We’ll also have two helper sets (i.e., helper variables) $Q = \emptyset$ and $B^{'} = \emptyset$ .

Step 1.
1. Let $M_{1}$ be the deterministic oracle Turing machine defined by the bit string $⟨ 1 ⟩_{2}$ . For simplicity, we assume that $M_{1} (∣ x ∣)$ runs in time $O (∣ x ∣)$ .
2. Choose $n$ such that $2^{n} > n$ .
3. Run $b = M_{1}^{B} (1^{n})$ .
  1. Whenever $M_{1}$ queries oracle $B$ at string $q$ , reply with $0$ /reject. Update $Q = Q \cup {q}$ .
4. If $b = 1$ :
  1. Update $B^{'} = B^{'} \cup {0, 1}^{n}$ . That is, add all $n$ -bit strings to the set $B^{'}$ . Here, $B^{'}$ is representing the set of all strings that are not in $B$ .
5. If $b = 0$ :
  1. Find $x^{*} \in {0, 1}^{n}$ such that $x^{*} \in / Q$ .
  2. Update $B = B \cup {x^{*}}$ .
  3. Update $B^{'} = B^{'} \cup ({0, 1}^{n} ∖ {x^{*}})$ .
For $i = 2, 3, 4, \dots$ :
1. Assume machine $M_{i}$ runs in time $n^{i}$ for inputs of length $n$ .
2. Choose $n$ such that $2^{n} > n^{i}$ .
3. Run $b = M_{i}^{B} (1^{n})$ .
  1. Whenever $M_{i}$ queries oracle $B$ at string $q$ , update $Q = Q \cup {q}$ .
  2. If $q \in B^{'}$ , reply $0$ /reject.
  3. If $q \in B$ , reply $1$ /accept.
4. If $b = 1$
  1. Update $B^{'} = B^{'} \cup {0, 1}^{n}$ .
5. If $b = 0$
  1. Find $x^{*} \in {0, 1}^{n}$ such that $x^{*} \in / Q$ .
  2. Update $B = B \cup {x^{*}}$ .
  3. Update $B^{'} = B^{'} \cup ({0, 1}^{n} ∖ {x^{*}})$ .

Now, we claim that for this language $B$ , we have $U_{B} \in / P^{B}$ . First, clearly $U_{B} \in NP^{B}$ since an NTM can simply guess a correct string in the language $B$ . Now, let $M$ be any deterministic oracle Turing machine and suppose by way of contradiction that $M$ decides $U_{B}$ . Notice that for each $M_{i}$ , there are an infinite number of equivalent descriptions for $M_{i}$ . In particular, there exists some $i^{*}$ such that $M = M_{i^{*}}$ . Now, consider $M^{B} (1^{n})$ . If $M^{B} (1^{n}) = 1$ , it is saying that there exists $x \in B$ such that $∣ x ∣ = n$ . However, in this case, by construction of $B$ (and, in particular, step 2.4.1), we know that all strings of length $n$ are in the set $B^{'}$ and are not in the set $B$ . So $M^{B} (1^{n})$ should output $0$ in this case. Similarly, if $M^{B} (1^{n}) = 0$ , then by construction of $B$ , we know there exists some length $n$ string $x^{*} \in B$ , so $M^{B} (1^{n})$ should output $1$ in this case. Both cases lead to a contradiction, so $U_{B} \in / P^{B}$ . $□$

Intuitively, in the above diagonalization proof, we are exploiting two key facts: (1) there are an infinite number of equivalent Turing machine descriptions; and (2) deterministic Turing machines cannot search an exponential space in polynomial time. (1) allows us to say that if we are given some decider, then there is some $i^{*}$ for which $M_{i^{*}}$ is the same machine, which means we have considered it in our construction of $B$ . (2) allows us to diagonalize. In particular, $M_{i}^{B} (1^{n})$ must produce an output by only making a polynomial number of queries (at most $n^{i}$ ). Since $n^{i} < 2^{n}$ for large enough $n$ , we know that $M_{i}^{B}$ could not have possibly queried the entire set of length $n$ bit strings. So, intuitively, the deterministic machine is making a decision with incomplete information.

We exploit this. If $M_{i}^{B} (1^{n}) = 1$ , we declare all length $n$ strings to not be in the language. Since $M_{i}$ could not have queried all length $n$ strings before outputting its decision, the lack of information leads it to make a wrong decision. Similarly, if $M_{i}^{B} (1^{n}) = 0$ , then again there is no way $M_{i}$ could have queries all length $n$ bit strings. So we explicitly find one that was not queried and add it to the set $B$ . This again causes $M_{i}$ to output erroneously. Thus, we cannot decide the language $U_{B}$ in deterministic polynomial time when given an oracle to $B$ .

Lecture 9

In-class notes: CS 505 Spring 2025 Lecture 9

Space Complexity

So far, we have only focused on the time complexity of computations. However, an equally important metric to consider is space complexity. Intuitively, though we would like computations to be fast, time is an abundant resource. For example, we do not need to get new hardware to let computations run longer, we simply just let our computers run longer. However, for space, we cannot simply “download more RAM” or disk space. So it is an important factor to consider in computational complexity.

Definition. Let $S : N \to N$ . A language $L$ is in the class $DSPACE (S)$ if there exists a deterministic Turing machine $M_{L}$ which decides the language $L$ using at most $O (S (∣ x ∣))$ additional space. That is, for any $x \in {0, 1}^{*}$ , $x \in L$ if and only if $M_{L} (x) = 1$ and uses $O (S (∣ x ∣))$ space on its non-output work tapes.

Intuitively, when considering the space constraints of an algorithm, we do not count the input length against the space usage (since it must read all the input to do a deterministic computation). We also do not count the output tape (and, in fact, sometimes we remove the output tape and simply encode accept/reject in the final halting state).

Similarly, we define the class $NSPACE (S)$ as the set of all languages decidable by a non-deterministic Turing machine using at most $O (S)$ space on its work tapes for any non-deterministic computational path made by the NTM.

Power of Space-bounded Computations

Intuitively, we believe space-bounded computations to be more powerful than time-bounded ones. This is because we cannot reuse time, but we can continually reuse space. As an example, recall that we do not believe $SAT \in P$ ; that is, no polynomial-time algorithm solves/decides $SAT$ . However, $SAT \in DSPACE (n)$ : it is solvable only using linear space by a deterministic Turing machine.

To see this, let $M$ be a DTM which takes as input a SAT formula $ϕ$ . To decide if $ϕ \in SAT$ (i.e., $ϕ$ is satisfiable), we can use linear space and simply test all possible assignments. Suppose that $ϕ$ has $m$ variables $x_{1}, \dots, x_{m}$ .

$M (ϕ)$ :

For all $u_{1}, \dots, u_{m} \in {0, 1}$
- Test if $ϕ (u_{1}, \dots, u_{m}) = 1$ .
- If yes, output $1$ .
Output $0$ .

Note we are only keeping track of $m$ bits (i.e., we are counting from $0$ to $2^{m} - 1$ in binary) and testing if this produces a satisfying assignment. Testing if $ϕ$ is satisfiable under assignment $x_{i} = u_{i}$ requires $O (n)$ space, where $∣ ϕ ∣ = n$ , and clearly $m \leq n$ . Therefore, we only need linear space to decide if $ϕ$ is satisfiable. Notably, this is an exponential time algorithm, but only uses linear space.

Space Constructible Functions

Like with time complexity, for space complexity we care about the notion of space constructible functions. As you may guess, these are functions that are constructible in limited space.

Definition. A function $S : N \to N$ such that $S (n) \geq lo g (n)$ is space constructible if there exists a deterministic Turing machine $M$ such that on input $x \in {0, 1}^{*}$ , $M (x)$ outputs $S (∣ x ∣)$ using at most $O (S (∣ x ∣))$ space on any of its work tapes.

Theorem. For every space constructible function $S$ , we have $DTIME (S) \subseteq DSPACE (S) \subseteq NSPACE (S) \subseteq DTIME (2^{O (S)}) .$

Proof. Clearly $DTIME (S) \subseteq DSPACE (S)$ since any deterministic Turing machine running in time $O (S)$ can use at most $O (S)$ space on its work tapes. Similarly, $DSPACE (S) \subseteq NSPACE (S)$ since every deterministic computation is also a non-deterministic computation.

What remains to show is $NSPACE (S) \subseteq DTIME (2^{O (S)})$ . To see this, we use the notion of a configuration graph. We let $G_{M, x}$ denote the configuration graph of $M (x)$ for any NTM $M$ . The graph is a directed acyclic graph which consists of all possible configurations of the tape(s) of $M (x)$ . There is a unique starting node, which corresponds to the unique starting configuration of every Turing machine. There is an edge from node $u$ to $v$ in the graph if the configuration $v$ can be reached by a valid transition from the configuration $u$ in a single step of the computation.

The graph $G_{M, x}$ is said to be accepting if there exists a path from the starting configuration to some halting configuration that is accepting. Moreover, if such a path exists, then $M (x)$ accepts. One final thing we need is the following fact.

Fact. Any NTM $M$ running in time $O (T)$ can be converted to an equivalent NTM $M^{'}$ where the transition function of $M^{'}$ outputs at most two states per computation step and $M^{'}$ runs in time $O (T)$ .

By definition, we know that the non-deterministic Turing machine transition function is defined as $δ : Q \times Γ^{k} \to P (Q \times Γ^{k - 1} \times {L, R, S}^{k}) .$ By definition, for a fixed Turing machine $M$ , all of $Q$ , $Γ$ , and $k$ are fixed. So $∣ P (Q \times Γ^{k - 1} \times {L, R, S}^{k}) ∣ = Θ (1)$ . Given this, there is a way to convert the machine with transition function $δ$ to an equivalent machine with transition function $δ^{'}$ where $δ^{'}$ outputs at most $2$ configurations per input.

All together, this allows us to notice the following properties about the configuration graph $G_{M, x}$ . If $M$ uses $O (S)$ space on any input, then

$G_{M, x}$ has at most $2^{O (S)}$ vertices. In particular, every vertex/configuration can be encoded with $O (S)$ space (or $O (k \cdot S)$ for a $k$ -tape machine, but $k$ is considered constant).
There exists a CNF $ψ$ of size $O (S)$ such that $ψ (C, C^{'}) = 1$ if and only if $C$ and $C^{'}$ are valid configuration sand $C^{'} \in δ (C)$ ; that is, $C^{'}$ is reachable from $C$ in a single step of the computation.

(1) above follows from the above Fact and subsequent discussion. (2) follows directly from the Cook-Levin theorem. In particular, the formula $ψ$ is simply the formula which is checking that all computation windows are valid, and all symbols are valid.

With all of this setup, let $M$ be an NTM which decides some language $L \in NSPACE (S)$ . We construct a machine $M^{'}$ which decides $L$ in $2^{O (S)}$ time. On input $x$ , the machine $M^{'}$ construct the graph $G_{M, x}$ , which takes $2^{O (S)}$ time. Construction of this graph requires checking edges between nodes, which is done in $O (S)$ time per pair of vertices since $ψ$ has size $O (S)$ . This gives a total time of $(2^{O (S)})^{2} = 2^{O (S)}$ . Then $M^{'}$ simply searches for a path from the starting configuration to an accepting configuration, again taking at most $2^{O (S)}$ time. $□$

Space Hierarchies

Unsurprisingly, just like with time constructible functions, we have similar space hierarchy theorems for space constructible functions.

Theorem. Let $f, g$ be space constructible functions such that $f = o (g)$ . Then $DSPACE (f) ⊊ DSPACE (g)$ .

The proof is identical to the deterministic time hierarchy theorem, except we do not have the logarithmic loss. This is because of universal simulation: when simulating a time $T$ Turing machine, we need $O (T lo g (T))$ time, but to simulate a space $S$ Turing machine, we only need $O (S)$ space.

Space Complexity Classes

Just like with time complexity, we have complexity classes related to space bounded computations. Four of them will be of interest to us.

$PSPACE = ⋃_{c \in N} DSPACE (n^{c})$ .
- This is the space analogue of $P$ .
$NPSPACE = ⋃_{c \in N} NSPACE (n^{c})$ .
- This is the space analogue of $NP$ .
$L = DSPACE (lo g (n))$ .
$NL = NSPACE (lo g (n))$ .

Some facts about the above classes.

It is an open question of $L = NL$ . We believe it to not be true; that is, $L \neq = NL$ .
In a somewhat surprising result, $PSPACE = NPSPACE$ .
In an even more surprising result, $NL = coNL$ .

We’ll get to the bottom two points in our later discussions on space complexity. Let’s see some examples now.

Space Complexity Class Examples

$SAT \in PSPACE$

This implies that $NP \subseteq PSPACE$ . We can see this since for any language $L \in NP$ , under the certificate definition there exists a DTM $M$ and a polynomial $p$ such that for all $x$ , we have $x \in L$ if and only if there exists a string $w \in {0, 1}^{p (∣ x ∣)}$ such that $M (x, w) = 1$ . We can easily construct a polynomial space DTM $M^{'}$ which on input $x$ simply iterates over all possible strings $w$ and checks if $M (x, w) = 1$ . This only uses polynomial space since we can reuse the space for the string $w$ .

Languages in $L$

Checking if a string has an even number of $1$ ’s is in $L$ , and so is checking if the product of two integers is equal to a third integer. Formally, let $EVEN = {x \in {0, 1}^{*} : x has an even number of 1’s}; MULT = {(⟨ x ⟩_{2}, ⟨ y ⟩_{2}, ⟨ z ⟩_{2}) : x, y, z \in N \land x \cdot y = z} .$

The language $EVEN$ is decidable in only logarithmic space because if $n = ∣ x ∣$ is the size of the input, counting the number of $1$ ’s in $x$ only takes $O (lo g (n))$ bits, then checking if this is even simply requires checking if the least significant bit is $0$ . This only takes logarithmic space.

For $MULT$ , we are given the binary representation of three integers $x$ , $y$ , and $z$ . Assume each integer is $n$ bits. So the input is of length $3 n$ . Then, to multiply $x$ and $y$ , one can simply do the grade school multiplication algorithm over the binary representations of $x$ and $y$ . If we are not careful and try to write down the result of $x \cdot y$ , we would require $n$ -bits since $z$ is also $n$ bits. So we can simply compute $x \cdot y$ bit by bit and compare to the bits of $z$ . We will have to track carry over bits, which will be at most $O (lo g (n))$ bits to track.

Languages in $NL$

Deciding if a directed graph has a path between two vertices $s$ and $t$ is in $NL$ . Formally, let $PATH$ be the following language. $PATH = {(G, s, t) : G is a directed graph with a path from s to t}$ The number of vertices in $G$ is $n$ and the number of edges is at most $n^{2}$ . To see that $PATH \in NL$ , we give a non-deterministic Turing machine which decides $PATH$ using $O (lo g (n))$ space. Notice that if there exists a path from $s$ to $t$ in $G$ , then it is of length at most $n$ . Thus, a non-deterministic Turing machine simply performs a depth first search of depth at most $n$ starting at $s$ . If it finds $t$ , then it outputs accept; otherwise it outputs reject. If there is a path from $s$ to $t$ , then there exists a series of non-deterministic choices to make for the depth first search that will end up at $t$ .

Note

We’ll see later that $PATH$ is the “essence” of $NL$ ; that is, it is $NL$ -complete.

$PSPACE$ -complete Languages

We continue our study of space complexity by examining $PSPACE$ -complete languages. Just like with $NP$ -complete languages, we define $PSPACE$ -completeness with respect to polynomial time reductions.

Definition. A language $L$ is $PSPACE$ -complete if

$L \in PSPACE$ ; and
$\forall L^{'} \in PSPACE$ , we have $L^{'} \leq_{p} L$ .

True Quantified Boolean Formulas

Up to now, we have only seen “fixed” Boolean formulas. For example, $ϕ = (x_{1} \lor x_{2}) \land (\overline{x}_{3} \lor x_{2})$ . Then, we say that $ϕ \in SAT$ if and only if $\exists x_{1}, x_{2}, x_{3} \in {0, 1}$ such that $ϕ (x_{1}, x_{2}, x_{3}) = 1$ .

We can generalize the above to include different quantifiers. For example:

$\exists x_{1} \forall x_{2}, x_{3} ϕ (x_{1}, x_{2}, x_{3}) = 1$ ;
$\forall x_{1}, x_{2}, x_{3} ϕ (x_{1}, x_{2}, x_{3}) = 1$ (notice this is the language $TAUTOLOGY$ , the $coNP$ -complete language);
$\forall x_{1} \exists x_{2} \forall x_{3} ϕ (x_{1}, x_{2}, x_{3}) = 1$ .

With this, we can define quantified Boolean formulas.

Definition. Let $ϕ$ be a Boolean formula with $m$ variables. Then, we say that $φ = Q_{1} x_{1} Q_{2} x_{2} \dots Q_{m} x_{m} ϕ (x_{1}, \dots, x_{m})$ is a quantified Boolean formula, where $Q_{i} \in {\exists, \forall}$ and $x_{i} \in {0, 1}$ for all $i \in [m]$ . We say that $φ$ is a true quantified Boolean formula if $Q_{1} x_{1} Q_{2} x_{2} \dots Q_{m} x_{m} ϕ (x_{1}, \dots, x_{m}) = 1$ .

Example

$\exists x \in {0, 1}^{m} ϕ (x) = 1$ (this is $SAT$ ).
$\forall x \in {0, 1}^{m} ϕ (x) = 1$ (this is $TAUTOLOGY$ ).
$\forall x \in {0, 1}^{m} ϕ (x) = 0$ (this is $UNSAT$ ).
$\forall x \in {0, 1} \exists y \in {0, 1} (x \land y) \lor (\overline{x} \land \overline{y}) = 1$ .
- This is $1$ if and only if $x = y$ .
$\exists x \forall y (x = y)$ .
- This always evaluates to $0$ .

Building on what we have, we let $TQBF$ be the set of all true quantified Boolean formulas $ψ$ .

Theorem. $TQBF$ is $PSPACE$ -complete.

Aside: the Essence of $PSPACE$

One can actually think of $TQBF$ as the set of two-player games with perfect information such that player 1 has a winning strategy. In particular, the essence of $PSPACE$ turns out to be finding optimal strategies for player 1 in a 2 player game with perfect information (that is, no randomness and no hidden information such as a hand in card games). Three concrete examples: Chess, Go, and Tic-Tac-Toe (though for Tic-Tac-Toe you have to phrase it differently since drawing the game is the optimal strategy). Some other subtleties (which we will not get into in this class) that arise here are, for example, finite boards being easy to deal with in $PSPACE$ , which need to be fixed by making the board a 2D infinite grid.

Proving $TQBF$ is $PSPACE$ -complete

We’ll begin the proof of this theorem, and finish it in next lecture. For now, we’ll show that $TQBF \in PSPACE$ . Let $ψ$ be a QBF on $m$ variables $x_{1}, \dots, x_{m}$ . We let $∣ ψ ∣ = n$ . We construct a decider $M$ to check if $ψ = 1$ , with the goal being this decider uses at most $poly (n)$ space for some polynomial in $n$ .

Let $M$ be a simple recursive Turing machine which does the following.

$M (ψ)$ :

If $ψ$ has no quantifiers, output $1$ if $ψ = 1$ and output $0$ otherwise.
Else if $ψ = \exists x \in {0, 1} ϕ$ ¹
- Run $M (ϕ)$ with $x = 0$ and $x = 1$ .
- If $M (ϕ)$ returns $1$ on either of these inputs, return $1$ . Else return $0$ .
Else if $ψ = \forall x \in {0, 1} ϕ$
- Run $M (ϕ)$ with $x = 0$ and $x = 1$ .
- Return $1$ if and only if $M (ϕ)$ returns $1$ on both these inputs; otherwise return $0$ .

First, clearly $M (ψ)$ decides if $ψ = 1$ . In the base case, if $ψ$ has all variable set to a value, then $M$ simply checks if $ψ$ is satisfied by the generated assignment. Then, if $ψ = \exists x ϕ$ , $M$ returns $1$ if and only if assigning $x = 0$ or $x = 1$ returns true, which is enough since we only need at least one of these to return true. Finally, if $ψ = \forall x ϕ$ , then $M$ returns $1$ if and only if assigning $x = 0$ and $x = 1$ both result in a true QBF.

To finish the proof, note we already established we only need $O (m)$ space in the base case of the recursion. The recursion depth is $m$ , and at every level we are only storing a single variable worth of information, so the final space complexity is $O (m)$ . This finishes the first part of the proof; namely, we have shown $TQBF \in PSPACE$ .

This is another QBF with 0 or more quantifiers. ↩

Lecture 10

In-class notes: CS 505 Spring 2025 Lecture 10

$TQBF$ is $PSPACE$ -hard

Last time, we established that $TQBF \in PSPACE$ . Now, to show $TQBF$ is $PSPACE$ -complete, we show that it is $PSPACE$ -hard. That is, $\forall L \in PSPACE$ , we show $L \leq_{p} TQBF$ .

Let $L$ be any language in $PSPACE$ , and let $M_{L}$ be the decider for $L$ . Suppose that for any input $x$ of length $n$ , $M_{L}$ uses at most $O (S (n))$ space for $S (n) = n^{c}$ , where $c \geq 1$ is a constant. Recall the configuration graph (see Lecture 9) of a Turing machine $M_{L}$ . We know that the configuration graph $G_{M_{L}, x}$ has at most $2^{O} (S (n))$ nodes, and each configuration requires $O (S (n))$ bits. By the facts we established last lecture about the configuration graph $G_{M_{L}, x}$ , we know that $M_{L} (x) = 1$ if and only if there is a path from the starting configuration to an accepting configuration in the graph $G_{M_{L}, x}$ . Moreover, there exists an $O (S (n))$ sized formula $ψ$ such that $ψ (c, c^{'}) = 1$ if and only if $c, c^{'}$ are valid configurations of $M_{L} (x)$ and $c^{'}$ follows from $c$ under the transition function of $M_{L} (x)$ .

For our reduction, our goal will be to take $x$ and transform it into a QBF $f (x)$ such that $x \in L$ if and only if $f (x) \in TQBF$ . By our above discussion, we will utilize the configuration graph. The idea will be to construct a QBF $ϕ$ that is true if and only if there exists a path from the starting state to the accepting state in the configuration graph $G_{M_{L}, x}$ .

First Attempt. Let $G$ be any directed graph. Suppose we consider two vertices $u$ and $v$ in $G$ such that there is a path from $u$ to $v$ of length at most $2^{i}$ for some $i \in N$ . Then, there must exist another vertex $x$ such that there is a path from $u$ to $x$ of length at most $2^{i - 1}$ , and a path from $x$ to $v$ of length at most $2^{i - 1}$ . If this wasn’t true; that is, there did not exist such a vertex $x$ , then any path from $u$ to $v$ would need to be of length at least $2^{i} + 1$ .

Let’s try to build a QBF recursively to take advantage of the above ideas. Let $ϕ_{0} = ψ$ ; i.e., the formula $ψ$ for testing adjacent configurations. Our goal will be to construct $ϕ_{k} = ϕ$ (the final QBF, where $k = lo g (2^{O (S (n))}) = O (S (n))$ ; that is, log of the number of nodes in the configuration graph $G_{M_{L}, x}$ ). In particular, we want $ϕ$ to have the property that $ϕ (c_{s t a r t}, c_{a cce pt}) = 1$ if and only if there exists a path from the starting configuration $c_{s t a r t}$ to an accepting configuration $c_{a cce pt}$ .

There is actually a simple way to define $ϕ_{k}$ using our fact about paths between vertices of length at most $2^{i}$ . For any two configurations $c, c^{'}$ , define the formula $ϕ_{i} (c, c^{'}) = \exists c^{''} (ϕ_{i - 1} (c, c^{''}) \land ϕ_{i - 1} (c^{''}, c^{'})$ . Here, the formula $ϕ_{i} (c, c^{'}) = 1$ if and only if there is a path from $c$ to $c^{'}$ of length at most $2^{i}$ . We build this formula recursively by saying $ϕ_{i} (c, c^{'}) = 1$ if and only if there exists a vertex/configuration $c^{''}$ such that there is a path from $c$ to $c^{''}$ of length at most $2^{i - 1}$ and a path from $c^{''}$ to $c^{'}$ of length at most $2^{i - 1}$ . Recursively, the formula $ϕ_{i - 1}$ checks this statement. Finally, when the recursion bottoms out, it reduces to checking if two configurations are adjacent.

If we analyze the size of these formulas, notice that $∣ ϕ_{0} ∣ = O (S (n))$ by construction. Then, $∣ ϕ_{1} ∣ \geq 2 \cdot ∣ ϕ_{0} ∣$ . Recursively, we have $ϕ_{i} \geq 2 \cdot ∣ ϕ_{i - 1} ∣$ . This gives our final formula size at least $∣ ϕ_{k} ∣ \geq 2^{k} \cdot O (S (n)) = 2^{O (S (n))}$ . So the formula $ϕ_{k}$ is too big! It requires exponential space.

Insight: Define an Equivalent QBF. Our final formula had exponential size because we were recursively checking two sub-formulas. This doubled the formula length at each recursion. However, we can take advantage of Boolean logic to define a formula that is equivalent to $ϕ_{i} (c, c^{'}) = \exists c^{''} ϕ_{i - 1} (c, c^{''}) \land ϕ_{i - 1} (c^{''}, c)$ but only requires a single recursive call to the formula $ϕ_{i - 1}$ . We define the formula first then explain what each component is doing.

$ϕ_{i} (c, c^{'}) = \exists c^{''} \forall d_{1} \forall d_{2} [(d_{1} = c \land d_{2} = c^{''}) \lor (d_{1} = c^{''} \land d_{2} = c^{'})] ⟹ ϕ_{i = 1} (d_{1}, d_{2})$

First, $c^{''}$ is still the target vertex we want to check. That is, we still want to check if there is a path from $c$ to $c^{''}$ of length at most $2^{i - 1}$ and a path from $c^{''}$ to $c^{'}$ of length at most $2^{i - 1}$ . Now, instead of calling $ϕ_{i - 1}$ twice and taking the and of the results, we introduce the $\forall d_{1}, d_{2}$ . What is this doing exactly?

Consider the expression with in the square brackets. This expression evaluates to true if and only if $d_{1} = c$ and $d_{2} = c^{''}$ , or $d_{1} = c^{''}$ and $d_{2} = c^{'}$ . In the first case, when $d_{1} = c$ and $d_{2} = c^{''}$ , we check if $ϕ_{i - 1} (c, c^{''})$ is true. Great! This is exactly one of the checks we want to perform. Then, in the second case, when $d_{1} = c^{''}$ and $d_{2} = c^{'}$ , we again perform the check we want: $ϕ (c^{''}, c^{'})$ .

Now, what about for all other values of $d_{1}, d_{2}$ ? Well, recall that for the Boolean function “ $\Rightarrow$ ,” we know that $(F \Rightarrow ?)$ always evaluates to True, no matter what $?$ is. So whenever $d_{1}$ and $d_{2}$ are not the target pairs of vertices, the expression trivially evaluates to true. This is fine since we are not checking the distance between these arbitrary variables. However, for the variables we explicitly want to check, the expression will be true if and only if $ϕ_{i - 1} (d_{1}, d_{2})$ evaluates to true. This gives us the formula we want!

To wrap up the proof, we let $ϕ = \exists c_{a cce pt} ϕ_{k} (c_{s t a r t}, c_{a cce pt}) = 1$ , where $c_{a cce pt}$ can be any accepting configuration and $c_{s t a r t}$ is the unique starting configuration. Then, $ϕ$ is a QBF which evaluates to $1$ if and only if there is a path from $c_{s t a r t}$ to some $c_{a cce pt}$ of length at most $2^{O (S (n)}$ (in particular, the exact number of vertices). This happens if and only if $M_{L} (x) = 1$ .

Finally, we analyze the size of the formula $ϕ$ . Notice that $∣ ϕ_{i} ∣ = O (1) + ∣ ϕ_{i - 1} ∣$ . Since $∣ ϕ_{0} ∣ = O (S (n))$ , we have that $∣ ϕ ∣ = O (k \cdot S (n)) = O (S (n)^{2})$ , which is polynomial in $n$ since $S (n)$ is a polynomial in $n$ . $□$

$PSPACE = NPSPACE$

Notice in the proof we actually didn’t use the fact that $M_{L}$ was a deterministic Turing machine. In fact, the above proof holds even if $M_{L}$ is a non-deterministic Turing machine. Thus, we have actually shown that $TQBF$ is $NPSPACE$ -hard. And since $TQBF$ is also a language in $NPSPACE$ , we actually showed that $TQBF$ is $NPSPACE$ -complete. This shows the two classes are equal.

Theorem. $PSPACE = NPSPACE$ .

This is a somewhat surprising result since we do not believe the same is true for polynomial-time computations; i.e., we do not believe that $P$ equals $NP$ .

Savich’s Theorem

We can actually show something more fine-grained about deterministic space versus non-deterministic space. The following result would equally show that $PSPACE = NPSPACE$ .

Theorem (Savich’s Theorem). For all space constructible functions $S$ , we have $NSPACE (S) \subseteq DSPACE (S^{2})$ .

Proof. We will again take advantage of the configuration graph of a Turing machine that we have been using. Let $L \in NSPACE (S)$ with corresponding NTM $M_{L}$ using at most $O (S)$ additional space on its worktapes. Let $G_{M_{L}, x}$ be its corresponding configuration graph with at most $2^{O (S (n))}$ nodes for any $x$ of length $n$ . Recall also that $x \in L$ if and only if there is a path in $G_{M_{L}, x}$ from the start configuration to some accepting configuration.

Our goal will be to construct a deterministic Turing machine $D_{L}$ which decides $L$ using $O (S^{2})$ space. The machine $D_{L}$ will operate as follows.

$D_{L} (x)$ :

Simulate $M_{L} (x)$ by traversing the graph $G_{M_{L}, x}$ .
Traversal will utilize a recursive procedure $reach (u, v, i)$ which returns $1$ if and only if there is a path from node $u$ to node $v$ of length at most $2^{i}$ .
- The recursion utilizes the same fact we had in the proof that $TQBF$ is $PSPACE$ -complete. IN particular, $reach (u, v, i) = 1$ if and only if there exists $z$ such that $reach (u, z, i - 1) = 1$ and $reach (z, v, i - 1) = 1$ .
Suppose $G_{M_{L}, x}$ has $2^{k}$ nodes for $k = O (S (n))$ . For all accepting states $c_{a cce pt}$ , run $reach (c_{s t a r t}, c_{a cce pt}, k)$ . If this procedure outputs $1$ , output $1$ . If none of the calls output $1$ , output $0$ .
- To run this procedure, each recursive call simply runs over all nodes $d$ (this requires $k = O (S (n))$ bits) and checks if $reach (c_{s t a r t}, d, k - 1) = 1$ and $reach (d, c_{a cce pt}, k - 1) = 1.$
- $reach (u, v, 0) = 1$ if and only if the edge $(u, v)$ is in $G_{M_{L}, x}$ .

Notice that the recursive procedure $reach$ bottoms out after $k$ calls. During each call, we store $k$ bits for the current vertex $d$ being enumerated over. The machine $D_{L} (x)$ is performing a depth-first search of the graph $G_{M_{L}, x}$ . At the bottom of the recursion, $O (k^{2}) = O (S (n)^{2})$ space is used. Therefore, $D_{L} (x)$ uses at most $O (S (n)^{2})$ space. $□$

Alternate Proof of $PSPACE = NPSPACE$

We can use Savich’s Theorem as an alternate proof of this result. Recall that $PSPACE = c \geq 1 ⋃ DSPACE (n^{c}) NPSPACE = c \geq 1 ⋃ NSPACE (n^{c}) .$ Notice that for any space constructible $S$ , we have $DSPACE (S) \subseteq NSPACE (S)$ since all deterministic computations are also non-deterministic. By Savich’s Theorem, we have $NSPACE (S) \subseteq DSPACE (S^{2})$ . Finally, all polynomial functions are space constructible. So this implies for all $c \geq 1$ $DSPACE (n^{c}) \subseteq NSPACE (n^{c}) \subseteq DSPACE (n^{2 c}) .$ This shows $PSPACE = NPSPACE$ . $□$

$NL$ -Completeness

Recall that $NL$ is the set of all languages $L$ decidable on an NTM using at most $O (lo g (n))$ additional space for inputs of length $n$ . We showed last lecture that $PATH \in NL$ . Today, we’ll see that $PATH$ is the essence of $NL$ . That is, $PATH$ is $NL$ -complete, where $PATH = {(G, s, t) : G is a directed graph with a path from s to t} .$ Note however that we do not know if $PATH \in L$ , though we believe it to not be the case, otherwise $L = NL$ .

Before we show that $PATH$ is $NL$ -complete, we need a new notion of reducibility that is not polynomial-time. Let’s see why this is the case. Suppose $L, L^{'} \in NL ∖ {\emptyset, {0, 1}^{*}}$ . Then, $L \leq_{p} L^{'}$ . That is, any two languages are polynomial-time reducible to each other in $NL$ . Intuitively, this is because $L \subseteq NL \subseteq P$ and the reduction trivially has more power than problems in $L$ or $NL$ since it is limited to be polynomial-time only and is not restricted on space. So because we can decide $L$ using an NTM using at most $O (lo g (n))$ space, the reduction can simply compute the configuration graph of the NTM deciding $L$ on input $x$ , decide if $x \in L$ , then produce any instance $x^{'} \in L^{'}$ , all in polynomial time.

Therefore, we need to restrict the power of the reductions for completeness in $L$ and $NL$ . This leads us to logspace reductions.

Definition. Let $f : {0, 1}^{*} \to {0, 1}^{*}$ . We say that $f$ is (implicitly) logspace computable if there exists a constant $c \geq 1$ such that for all $x$ , $∣ f (x) ∣ \leq ∣ x ∣^{c}$ and the following languages are in $L$ . $L_{f} = {(x, i) : f (x)_{i} = 1} L_{f}^{'} = {(x, i) : i \leq ∣ f (x) ∣}$

We can now define logspace reducible.

Definition. Let $B$ and $C$ be any language. We say that $B$ is logspace reducible to $C$ , denoted as $B \leq_{L} C$ , if there exists a logspace computable function $f$ such that $x \in B$ if and only if $f (x) \in C$ .

This finally let us define $L$ and $NL$ -completeness.

Definition. We say that a language $B$ is $NL$ -complete (respectively, $L$ -complete) if

$B \in NL$ (resp., $B \in L$ ); and
$\forall C \in NL$ (resp., $\forall C \in L$ ), we have $C \leq_{L} B$ .

As with polynomial-time reductions, we have a “transitive” property of logspace reductions.

Theorem.

If $A \leq_{L} B$ and $B \leq_{L} C$ , then $A \leq_{L} C$ .
If $A \leq_{L} B$ and $B \in L$ then $A \in L$ .

Part (2) of the above theorem tells us that if $PATH \in L$ then $L = NL$ .

$PATH$ is $NL$ -complete

We can finally prove that $PATH$ is the “essence” of $NL$ .

Theorem. $PATH$ is $NL$ -complete.

Proof. We have already shown that $PATH \in NL$ . We now have to show that it is $NL$ -hard. That is, show that $B \leq_{L} PATH$ for any $B \in NL$ . Our good friend the configuration graph will help us yet again.

Let $B \in NL$ and let $M_{B}$ be the non-deterministic logspace decider for $B$ . This means for any $x$ , we have $x \in B$ if and only if $M_{B} (x) = 1$ using at most $O (lo g (∣ x ∣))$ space.

Our logspace reduction $f$ will simply construct the configuration graph $G_{M_{B}, x}$ . That is, on input $x$ , the reduction $f (x)$ will output the tuple $(G_{M_{B}, x}, c_{s t a r t}, c_{a cce pt})$ , where $c_{a cce pt}$ is an accepting configuration. Note that $∣ f (x) ∣ \leq 2^{O (l o g (n))} = poly (n)$ for any $x \in {0, 1}^{n}$ .

Recall by definition of our configuration graph, we know that $M_{B} (x) = 1$ if and only if there is a path from $c_{s t a r t}$ to $c_{a cce pt}$ in $G_{M_{B}, x}$ . This is precisely a problem instance of $PATH$ . Now, we can represent $G_{M_{B}, x}$ as an adjacency matrix of size $2^{O (l o g (n))} \times 2^{O (l o g (n))}$ . Entry $(c, c^{'}) = 1$ if and only if there is an edge from $c$ to $c^{'}$ in $G_{M_{B}, x}$ .

Now, we can check this in logspace. Given $c, c^{'}$ , there exists a deterministic machine to check of $c^{'}$ follows from $c$ according to $M_{B}$ ’s transition function. We can do this in space $O (∣ c ∣ + ∣ c^{'} ∣)$ . By our previous discussions on the configuration graph, these configurations need at most $O (lo g (n))$ space to represent. Thus, we can do this check in $O (lo g (n))$ space. Therefore, we have that $L_{f}, L_{f}^{'} \in L$ . So this is a valid logspace reduction.

This completes the proof as we have already encoded $M_{B} (x)$ into the $PATH$ problem on the instance $(G_{M_{B}, x}, c_{s t a r t}, c_{a cce pt})$ . $□$

Lecture 11

In-class notes: CS 505 Spring 2025 Lecture 11

Certificate Definition of $NL$

Just like with $NP$ , we can define the class $NL$ using a deterministic Turing machine that takes a certificate as additional input to help verify whether or not a string $x$ is in a language $L$ . Recall that for both $L$ and $NL$ , the space used by the input tape is not counted towards the overall space complexity, and the input tape is read-only. So it makes sense that a certificate definition of $NL$ does not count the size of the certificate when restricting the space complexity.

However, we actually need a stronger property to define $NL$ with respect to certificates. We need one additional special tape, which we call the certificate/witness tape, with the following properties:

the space used on the certificate tape is not counted against the overall space usage (as the witness could be polynomial in size); and
The tape is read-once.

Property (2) above turns out to be crucial, as if you allow the machine to read the certificate multiple times (i.e., it is a read-only tape like the input tape), then you actually end up back in the class $NP$ .

Definition. We say a language $L$ is in $NL$ if there exists a deterministic Turing machine $M$ with a special read-once certificate tape and a polynomial $p$ such that for all $x \in {0, 1}^{*}$ , $x \in L$ if and only if there exists $w \in {0, 1}^{p (∣ x ∣)}$ such that $M (x, w) = 1$ , where $x$ is the input and is on the input (i.e., read-only) tape and $w$ is the certificate/witness and is on the special certificate (i.e., read-once) tape, and $M$ uses $O (lo g (∣ x ∣))$ additional space on its read/write work tapes.

$NL = coNL$

Recall that we do not believe that $NP = coNP$ . This is because under the certificate definition of $NP$ , there needs to exist at least one certificiate such that the deterministic verifier outputs $1$ (i.e., $M (x, w) = 1$ for at least one witness $w$ ). In contrast, for $coNP$ , the machine $M$ has to output $1$ for all certificates of polynomial length (i.e., $M (x, w) = 1$ for all $w$ ). So, intuitively, $NP = coNP$ says that we can verify exponentially many certificates using a single certificate of polynomial length. Thus, we do not believe this to be the case.

Turning towards $NL$ , consider the class $coNL = {L : \overline{L} \in NL}$ . Under the certificate definition, again a machine for $coNL$ must output $1$ for all polynomial-sized witnesses $w$ . So it was not believed to hold that $NL = coNL$ .

However, this was shown to be true! In particular, it was shown that the $coNL$ -complete problem $\overline{PATH}$ lies in $NL$ . Recall that $PATH = {(G, s, t) : \exists a path from s to t in directed graph G}$ is $NL$ -complete. Under this definition, we have $\overline{PATH} = {(G, s, t) : there does not exist a path from s to t in directed graph G} .$

Theorem. $\overline{PATH} \in NL$ .

Proof. The key insight into the proof is that if we can enumerate all the vertices which are reachable from $s$ in the graph $G$ , and $t$ is not in this set, then we have shown there does not exist a path from $s$ to $t$ . We’ll build a certificate which verifies the sizes and the contents of the set of vertices which are reachable from $s$ . Our certificate will consist of many sub-certificates, which we will use all together to build the final certificate. The main challenges here are (1) the certificate must be read-once, and (2) we must verify the certificate using only logarithmic space.

Let $n = ∣ V ∣$ denote the number of vertices in the graph $G$ . Moreover, we will uniquely label each vertex using the set $[n]$ . Note that $∣ E ∣ = O (n^{2})$ , so our goal is still to construct a certificate of size $poly (n)$ while using $O (lo g (n))$ space. First, let $v \in V$ . We say that $v$ is reachable from $s$ in $G$ if there exists a path from $s$ to $v$ . Now, for every $i \in {0, 1, \dots, n}$ , we can define the set $C_{i} \subseteq V$ to be the set of vertices $v \in V$ such that $v$ is reachable from $s$ in at most $i$ steps. Notice by definition we have $C_{i - 1} \subseteq C_{i}$ .

Now, if we can generate a certificate for the set $C_{n}$ and show that $t \in / C_{n}$ , we are done. This is because the set $C_{n}$ is the set of all vertices in the graph that are reachable from $s$ in at most $n$ steps. Since the graph has $n$ vertices, if there is a path from $s$ to $v$ with more than $n$ steps, there is another path from $s$ to $v$ with at most $n$ steps.

Building Sub-certificates. We will build a number of sub-certificates that will help us build our final read-once certificate. First, for any $i \in [n]$ and $u \in C_{i}$ , we give a read-once certificate showing the statement “ $u \in C_{i}$ ”. We’ll call this certificate a $P1$ certificate.

$P1$ certificate:

The certificate will consist of $k + 1$ vertices $(v_{0}, v_{1}, \dots, v_{k})$ .
Verification of the certificate proceeds as follows.
1. Check if $v_{0} = s$ . This can be done in logarithmic space since this is just comparing $lo g (n)$ bits.
2. For all $j \in [k]$ , check if $(v_{j - 1}, v_{j}) \in E$ . Note here that $E$ is part of the input and can be read multiple times. This again can be done in logarithmic space since we are just storing a counter for $k$ , and comparing two bit strings of length $lo g (n)$ .
3. For all $j \in [k]$ , check if $u = v_{j}$ . Again, this is easily done in logarithmic space.
4. Count up to $k$ and check that $k \leq i$ . This again only needs $O (lo g (n))$ bits.

Note here that checks (2) and (3) are done at the same time since we are reading the certificate once. The certificate $P1$ will be used in our final certificate. One main issue is that we need to be able to verify for any $i$ that $C_{i}$ is actually the set of all vertices reachable from $s$ in at most $i$ steps in the graph $G$ . If we do not verify this, then it is trivial to come up with a certificate that $t \in / C_{n}$ when it actually is.

We’ll now build on the ideas of the certificate $P1$ . In particular, we’ll need two certificates that are more complicated.

Certificate $D1$ : this certificate will certify the statement $u \in / C_{i}$ given that $∣ C_{i} ∣ = s_{i}$ . Here, given means that we have already verified this statement to be true.
Certificate $D2$ : this certificate will certify that $s_{i} = ∣ C_{i} ∣$ given that $s_{i - 1} = ∣ C_{i - 1} ∣$ . We’ll use the certificate $P1$ to help us build these two certificates. Note that by definition, we know that $C_{0} = {s}$ , and it does not need to be included. This will serve as our base case for the final certificate, which will be an “inductive” certificate that continuously builds up to showing that $∣ C_{n} ∣ = s_{n}$ and $t \in / C_{n}$ .

$D1$ certificate: certifying that $v \in / C_{i}$ given $s_{i} = ∣ C_{i} ∣$ .

Certificate $P1$ gives us a certificate for verifying $u \in C_{i}$ for any $u$ .
- Let $w_{u, i}$ be this certificate.
We construct a new certificate for the statement $v \in / C_{i}$ . We denote this certificate by $\overline{w}_{v, i}$ . It is defined as $\overline{w}_{v, i} = (w_{u_{1}, i}, w_{u_{2}, i}, \dots, w_{u_{s_{i}}, i}) .$ Here, we have $C_{i} = {u_{1}, u_{2}, \dots, u_{s_{i}}}$ and $u_{i} < u_{i + 1}$ for all $i \in [s_{i} - 1]$ .
Given we know that $s_{i} = ∣ C_{i} ∣$ , we can use the certificate $\overline{w}_{v, i}$ to verify that $v \in / C_{i}$ as follows.
1. Run the verification procedure on all $w_{u_{j}, i}$ to certify they are valid (i.e., the verification procedure for $P1$ ).
2. Check that $u_{j} < u_{j + 1}$ for all $j \in [s_{i} - 1]$ .
3. Check that $v \neq = u_{j}$ for all $j \in [s_{i}]$ .
4. Check that $\overline{w}_{v, i}$ has exactly $s_{i}$ certificates.

Again, all of these checks can be arranged so they are executed in a read-once manner, and they all require only $O (lo g (n))$ bits of space to check.

Before building the certificate $D2$ , we actually need another helper certificate, which we denote as $D1.5$ . This certificate will certify $v \in / C_{i}$ given $s_{i - 1} = ∣ C_{i - 1} ∣$ .

$D1.5$ certificate: certifying that $v \in / C_{i}$ given $s_{i - 1} = ∣ C_{i - 1} ∣$ .

The certificate here will be identical to the certificate for $D1.5$ , except for $s_{i - 1}$ .
Let ${u_{1}, \dots, u_{s_{i - 1}}} = C_{i - 1}$ such that $u_{j} < u_{j + 1}$ for all $j \in [s_{i - 1} - 1]$ . Also let $w_{u_{j}, i - 1}$ denote the $P1$ certificate for $u_{j} \in C_{i - 1}$ for all $j \in [s_{i - 1}]$ .
Let $w_{v, i} = (w_{u_{1}, i - 1}, \dots, w_{u_{s_{i - 1}}, i - 1})$ . This is our certificate for $D1.5$ .
To verify $w_{v, i}$ , we perform nearly identical checks as with $D1$ .
1. Verify all $w_{u_{j}, i - 1}$ are valid.
2. Check $u_{j} < u_{j + 1}$ for all $j$ .
3. Check that $v \neq = u_{j}$ and $v$ is not a neighbor of $u_{j}$ for all $j$ .
4. Exactly $s_{i - 1}$ $P1$ certificates are given.

As always, we can rearrange the above checks to execute them in a read-once manner. Now we have the tools we need to construct a certificate for $D2$ .

$D2$ certificate: certifying that $∣ C_{i} ∣ = s_{i}$ given $∣ C_{i - 1} ∣ = s_{i - 1}$ .

Certificate $P1$ certifies that $u \in C_{i}$ .
Certificate $D1.5$ certifes that $v \in / C_{i}$ given $s_{i - 1} = ∣ C_{i - 1} ∣$ . We are also given this by our inductive assumption for this certificate.
We use this assumption plus the above certificates to construct a certificate for $s_{i} = ∣ C_{i} ∣$ .
We let $w_{i}$ denote this certificate.
- It will be the concatenation of $n$ certificates, one for every vertex: $w_{i} = (α_{1, i}, α_{2, i}, \dots, α_{n, i}) .$ Here, $α_{j, i} = {w_{j, i} w_{j, i} a P1 certificate if j \in C_{i} a D1.5 certificate if j \in / C_{i}$
We certify $w_{i}$ as follows.
1. Check that the number of $P1$ certificates is $s_{i}$ and that these $w_{j, i}$ certificates are valid.
2. Certify all $w_{j, i}$ certificates using the procedure of $D1.5$ .

Final Certificate. All together, we have the tools for the final certificate to verify that there is no path from $s$ to $t$ in the graph $G$ . The final certificate $w$ will consist of $n + 1$ certificates. $w = (w_{1}, w_{2}, \dots, w_{n}, \overline{w}_{t, n}) .$ Here, $w_{j}$ is a $D2$ certificate verifying that $s_{j} = ∣ C_{j} ∣$ given $s_{j - 1} = ∣ C_{j - 1} ∣$ . Note that since we are given $s_{0} = 1$ by definition, the certificate iteratively builds the sets $C_{1}$ then $C_{2}$ , up to $C_{n}$ . Once the final size of the set $C_{n}$ is certified, the final certificate is a $D1$ certificate which certifies that $t \in / C_{n}$ given $s_{n} = ∣ C_{n} ∣$ . $□$

The following is a corollary of the proof.

Corollary. For all space constructible functions $S \geq lo g (n)$ , we have $NSPACE (S) = coNSPACE (S)$ .

Introduction to the Polynomial Hierarchy

The motivation for studying the Polynomial Hierarchy is quite simple: sometimes, non-determinism is not strong enough to capture a decision problem itself.

First, recall the independent set problem. $INDSET = {(G, k) : G has an independent set of size at least k}$ Consider the following natural modification of the independent set problem, which we call exact independent set. $Exact - INDSET = {(G, k) : the largest independent set in G has size k}$

Another way to rephrase the exact independent set problem is as follows. $(G, k) \in Exact - INDSET$ if and only if $\exists$ an independent set $S$ of size $k$ in $G$ such that $\forall$ other independent sets $S^{'}$ , $∣ S^{'} ∣ \leq k$ .

If we stated this with a Turing machine, it would look like $(G, k) \in Exact - INDSET ⟺ \exists w_{1} \forall w_{i} M ((G, k), w_{1}, w_{2}),$ where intuitively $w_{1}$ is the certificate for the independent set $S$ of size $k$ and $w_{2}$ are all other independent sets in $G$ (which the machine $M$ checks that they have size at most $k$ ). Clearly, this is not captured by the certificate definition of $NP$ , which is equivalent to the NTM definition. So here, non-determinism alone is not enough.

The Class $Σ_{2}^{p}$

Before examining the full Polynomial Hierarchy, let’s capture the set of languages that follow the same structure as the exact independent set problem we outlined above. This is the class $Σ_{2}^{p}$ .

Definition. The class $Σ_{2}^{p}$ is the set of all languages $L$ such that there exists a deterministic Turing machine $M$ (the verifier) and a polynomial $q$ such that for all $x \in {0, 1}^{*}$ , $x \in L$ if and only if $\exists w_{1} \in {0, 1}^{q (∣ x ∣)} \forall w_{2} \in {0, 1}^{q (∣ x ∣)} M (x, w_{1}, w_{2}) = 1$ .

It is helpful to see another example of a language in the class $Σ_{2}^{p}$ . This is what is known as $MIN - EQ - DNF$ , and is defined as follows. $MIN - EQ - DNF = {(φ, k) : \exists DNF formula ψ of size at most k that is equivalent to φ}$ Here, a DNF formula is a “reverse” CNF: it is a big OR of many ANDs. Stating it again, $(φ, k) \in MIN - EQ - DNF$ if $\exists$ DNF $ψ$ of size at most $k$ such that $\forall u$ , we have $φ (u) = ψ (u)$ (they agree on all possible assignments).

Notice, in fact, that both $NP$ and $coNP$ are contained in $Σ_{2}^{p}$ .

Lecture 12

In-class notes: CS 505 Spring 2025 Lecture 12

The Polynomial Hierarchy

Last time, we defined the class $Σ_{2}^{p}$ . Today, we’ll generalize this class and define the Polynomial Hierarchy.

Definition. For any $i \in Z^{+}$ , the class $Σ_{i}^{p}$ is the set of all languages $L$ such that there exists a deterministic Turing machine $M_{L}$ deciding $L$ and a polynomial $q$ such that for all $x \in {0, 1}^{*}$ $x \in L ⟺ \exists w_{1} \in {0, 1}^{q (∣ x ∣)} \forall w_{2} \in {0, 1}^{q (∣ x ∣)} \exists w_{3} \in {0, 1}^{q (∣ x ∣)} ⋱ Q_{i} w_{i} \in {0, 1}^{q (∣ x ∣)} M_{L} (x, w_{1}, w_{2}, w_{3}, \dots, w_{i}) = 1.$ Here, $Q_{i} = \exists$ if $i$ is odd and $Q_{i} = \forall$ if $i$ is even.

Given this generalization of $Σ_{2}^{p}$ , we can define the Polynomial Hierarchy, which we denote as $PH$ .

Definition. $PH = ⋃_{i \in Z^{+}} Σ_{i}^{p}$ .

Note that we can also define the co-classes of any $Σ_{i}^{p}$ . These classes are denoted by $Π_{i}^{p}$ .

Definition. For any $i \in Z^{+}$ , $Π_{i}^{p} = co Σ_{i}^{p} = {L : \overline{L} \in Σ_{i}^{p}}$ . Equivalently, $L \in Π_{i}^{p}$ if there exists a deterministic Turing machine $M_{L}$ deciding $L$ and a polynomial $q$ such that for all $x \in {0, 1}^{*}$ $x \in L ⟺ \forall w_{1} \in {0, 1}^{q (∣ x ∣)} \exists w_{2} \in {0, 1}^{q (∣ x ∣)} \forall w_{3} \in {0, 1}^{q (∣ x ∣)} ⋱ Q_{i} w_{i} \in {0, 1}^{q (∣ x ∣)} M_{L} (x, w_{1}, w_{2}, w_{3}, \dots, w_{i}) = 1.$ Here, $Q_{i} = \exists$ if $i$ is even and $Q_{i} = \forall$ if $i$ is odd.

Note

$NP = Σ_{1}^{p} coNP = Π_{1}^{p} .$

Actually, we also have that $PH = ⋃_{i \in Z^{+}} Π_{i}^{p}$ . This is due to the following lemma.

Lemma. For all $i \geq 1$ , we have $Σ_{i}^{p} \subseteq Π_{i + 1}^{p} \subseteq Σ_{i + 2}^{p}$ .

Proof. Let $L \in Σ_{i}^{p}$ . By definition, there exists DTM $M_{L}$ and polynomial $q$ such that for all $x$ , we have $x \in L$ if and only if $\exists w_{1} \forall w_{2} \dots Q_{i} w_{i} M_{L} (x, w_{1}, \dots, w_{i}) = 1,$ where each $w_{j} \in {0, 1}^{q (∣ x ∣)}$ . To see that $L \in Π_{i + 1}^{p}$ , we can add a “dummy” quantifier $\forall w_{0}$ in front of $\exists w_{1}$ . We can define a machine $M_{L}^{'}$ which takes as input $(x, w_{0}, w_{1}, \dots, w_{i})$ , ignores $w_{0}$ , and outputs $M_{L} (x, w_{1}, \dots, w_{i})$ . This is clearly in $Π_{i + 1}^{p}$ . Note the same strategy works for a language $L \in Π_{i + 1}^{p}$ and lifting it to $Σ_{i + 2}^{p}$ . $□$

Properties of $PH$

It is widely believed that $Σ_{i}^{p} ⊊ Σ_{i + 1}^{p}$ , $Π_{i}^{p} ⊊ Π_{i + 1}^{p}$ , and $Π_{i}^{p} ⊊ Σ_{i + 1}^{p}$ for all $i \geq 1$ . This is because otherwise, the Polynomial Hierarchy collapses. That is, there exists $i$ such that $Σ_{i}^{p} = Σ_{i + 1}^{p}$ or $Π_{i}^{p} = Π_{i + 1}^{p}$ (which implies that $Σ_{i}^{p} = Π_{i}^{p}$ ). We say this is a collapse of $PH$ because if this happens, the $PH = Σ_{i}^{p} = Π_{i}^{p}$ .

Theorem.

If there exists $i$ such that $Σ_{i}^{p} = Π_{i}^{p}$ , then $PH = Σ_{i}^{p} = Π_{i}^{p}$ .
If $P = NP$ , then $PH = P$ .

Proof. Notice that (2) is actually just a corollary of (1) when $i = 1$ . However, we’ll directly prove (2), as the ideas in this proof readily extend to any $i$ . We’ll do a proof by induction. Suppose that $P = NP$ . Recall this also implies that $NP = coNP$ . Our proof by induction will establish that $Σ_{i}^{p}, Π_{i}^{p} \subseteq P$ for all $i$ .

For the base case, we have that $Σ_{1}^{p} = Π_{1}^{p} = P$ since $P = NP = coNP$ . For the inductive step, assume that $Σ_{i}^{p}, Π_{i}^{p} \subseteq P$ . We show that $Σ_{i + 1}^{p}, Π_{i + 1}^{p} \subseteq P$ .

First, we show the case for $L \in Σ_{i + 1}^{p}$ . Given $L \in Σ_{i + 1}^{p}$ , by definition we have a DTM $M_{L}$ which decides $L$ . That is, for polynomial $q$ and any $x$ , $x \in L$ if and only if $\exists w_{1} \forall w_{2} \dots Q_{i + 1} w_{i + 1} M_{L} (x, w_{1}, \dots, w_{i + 1}) = 1,$ where all $w_{j} \in {0, 1}^{q (∣ x ∣)}$ .

Define new language $L^{'}$ to be all pairs $(x, w_{1})$ such that $\forall w_{2} \dots Q_{i + 1} w_{i + 1} M_{L} (x, w_{1}, \dots, w_{i + 1}) = 1.$ Clearly, we have that $L^{'} \in Π_{i}^{p} \subseteq P$ by our inductive hyptophesis. This implies there exists a DTM $M^{'}$ such that $(x, w_{1}) \in L^{'}$ if and only if $M^{'} (x, w_{1}) = 1$ , where $M^{'}$ runs in polynomial time. Now, we can replace the decider $M_{L}$ for $L$ with $M^{'}$ as follows. Under this $M^{'}$ , we have that $x \in L$ if and only if $\exists w_{1} M^{'} (x, w_{1}) = 1$ . Now this implies that $L \in NP = P$ . Therefore, $Σ_{i + 1}^{p} \subseteq P$ .

For the other case, where $L \in Π_{i + 1}^{p}$ , we simply do the above strategy for the complement language $\overline{L} \in Σ_{i + 1}^{p}$ . $□$

Complete Problems for $PH$

We can readily define complete problems for $Σ_{i}^{p}$ , $Π_{i}^{p}$ , and $PH$ , using the same definition of polynomial-time reducibility we have for $(co) NP$ . And, in fact, for every $i$ , $Σ_{i}^{p}$ and $Π_{i}^{p}$ have complete problems.

However, we do not believe that $PH$ has a complete problem/language $L$ . This is because if one were to exist, then $PH$ again collapses to some level $i$ .

Theorem. If there exists $L \in PH$ that is $PH$ -complete, then there exists $i$ such that $PH = Σ_{i}^{p} = Π_{i}^{p}$ .

Proof. Recall that $PH = \cup_{i} Σ_{i}^{p} = \cup_{i} Π_{i}^{p}$ . So for any $L \in PH$ , there exists $i$ such that $L \in Σ_{i}^{p}$ or $Π_{i}^{p}$ . Assume that $L$ is $PH$ -complete. Then, for all $L^{'} \in PH$ , we have $L^{'} \leq_{p} L$ . Suppose that $L \in Σ_{i}^{p}$ . This implies that if $L^{'} \in Σ_{j}^{p}$ or $L^{'} \in Π_{j}^{p}$ for $j > i$ , we can in polynomial time decide $L^{'}$ using a decider for $L$ . Since $L \in Σ_{i}^{p}$ , this implies that $L^{'} \in Σ_{i}^{p}$ as well. The same holds if $L \in Π_{i}^{p}$ . Thus, $PH$ collapses to level $i$ . $□$

Recall that $TQBF$ is the complete problem for $PSPACE$ . Recalling the definition of $TQBF$ , it is a more general quantified Boolean formula and caputres computations in $PH$ . So we have $PH \subseteq PSPACE$ . However, as a corollary of the above theorem, unless $PH$ collapses, then $PH$ is a strict subset of $PSPACE$ .

Corollary. Unless $PH$ collapses, $PH \neq = PSPACE$ .

This is simply because $TQBF$ is a complete problem for $PSPACE$ , so if $PH = PSPACE$ , then $PH$ has the complete problem $TQBF$ , so it must collapse to some level $i$ .

Complete Problems for an $Σ_{i}^{p}$ and $Π_{i}^{p}$

In contrast to $PH$ , for any fixed $i$ , the classes $Σ_{i}^{p}$ and $Π_{i}^{p}$ each have complete problems.

For $Σ_{i}^{p}$ , we define the language $Σ_{i} SAT$ as follows. $Σ_{i} SAT = {ϕ : \exists u_{1} \forall u_{2} \dots Q_{i} u_{i} ϕ (u_{1}, \dots, u_{i}) = 1},$ where $Q_{i} = \exists$ if $i$ is odd and $Q_{i} = \forall$ if $i$ is even. Then, $Σ_{i} SAT$ is $Σ_{i}^{p}$ -complete.
For $Π_{i}^{p}$ , we define the language $Π_{i} SAT$ as follows. $Π_{i} SAT = {ϕ : \forall u_{1} \exists u_{2} \dots Q_{i} u_{i} ϕ (u_{1}, \dots, u_{i}) = 1},$ where $Q_{i} = \exists$ if $i$ is even and $Q_{i} = \forall$ if $i$ is odd. Then, $Π_{i} SAT$ is $Π_{i}^{p}$ -complete.
A different complete problem for $Σ_{2}^{p}$ , due to Umans in 1998. This is called $Succinct - Set - Cover$ . The problem is defined as follows. The input to the problem is a set ${ϕ_{1}, \dots, ϕ_{m}}$ of $m$ $3$ DNF formulas each with $n$ variables, and an integer $k$ . This input is said to be in $Succinct - Set - Cover$ if and only if there exists a subset $S \subset [m]$ of size at most $k$ such that the formula $φ_{S} = ⋁_{i \in S} ϕ_{i}$ is a tautology. More formally, $\exists S \subset [m] \forall x \in {0, 1}^{n} (∣ S ∣ \leq k \land (i \in S ⋁ ϕ_{i}) (x) = 1) .$ Clearly, this language is in $Σ_{2}^{p}$ . It turns out it is also $Σ_{2}^{p}$ -hard.

Alternating Turing Machines: Generalized Non-Determinism

Let’s think back to the non-deterministic Turing machine definition of $NP$ . We say that $L \in NP$ if there exists a NTM $M$ which decides $L$ . This was defined as follows: for all $x$ , $x \in L$ if and only if $M (x) = 1$ , where we say $M (x)$ outputs 1 if and only if there is at least one set of non-deterministic choices $M$ makes on input $x$ that causes $M (x)$ to output $1$ . Viewing this as a computation tree, or from the point of view of our configuration graph $G_{M, x}$ , it says that there is at least one path from $c_{s t a r t}$ to any accepting configuration $c_{a cce pt}$ in $G_{M, x}$ . Moreover, all paths to halting configurations have length $poly (∣ x ∣)$ .

Now, thinking back to $coNP$ , it is reversed: an NTM $M$ decides $L \in coNP$ if for any $x$ , $x \in L$ if and only if in the graph $G_{M, x}$ , all halting configurations are accepting. If this happens, we say that $M (x) = 1$ ; otherwise, if there is at least one rejecting configuration, $M (x) = 0$ .

How can we model this for, say, $Σ_{2}^{p}$ ? That is, is there a way to describe a non-deterministic machine which decides $L \in Σ_{2}^{p}$ ? Looking at the deterministic verifier definition, $L \in Σ_{2}^{p}$ if there exists deterministic verifier $D$ such that for all $x$ , $x \in L ⟺ \exists w_{1} \forall w_{2} D (x, w_{1}, w_{2}) = 1.$

It is not immediately clear how we would define a non-deterministic Turing machine to decide $L \in Σ_{2}^{p}$ . For languages in $NP$ , it was enough that there existed at least one accepting configuration in $G_{M, x}$ , and for $coNP$ , we needed all halting configurations to be accepting in $G_{M, x}$ . Thinking to the verification definitions of $(co) NP$ , this makes sense. But how can we generalize it to $Σ_{2}^{p}$ , and more generally $Σ_{i}^{p}$ and $Π_{i}^{p}$ ?

Alternating Turing Machines

The answer is alternating Turing machines. These machines, in some sense, generalize non-determinism as follows.

Definition. A non-determinstic Turing machine $M$ is said to be an alternating Turing machine (ATM) if it additionally labels non-halting states with $\exists$ or $\forall$ such that for any $x$ , $M (x) = 1$ if and only if the starting configuration is labeled $1$ after the following process.

Let $G_{M, x}$ be the directed configuration graph for $M (x)$ .
Label all configurations $c_{a cce pt}$ with $1$ and $c_{re j ec t}$ with $0$ .
For all non-halting nodes $v$ in the graph $G_{M, x}$ , corresponding to some non-halting configuration:
- If the state of $v$ is labeled $\exists$ , label $v$ with a $1$ if and only if at least one child of $v$ is labeled $1$ .
- If the state of $v$ is labeled $\forall$ , label $v$ with a $1$ if and only if all children of $v$ are labeled $1$ . We say that an ATM $M$ runs in time $T$ if all root-to-leaf paths in $G_{M, x}$ have length at most $O (T (∣ x ∣))$ for any input $x$ . Finally, we say that an ATM $M$ is $i$ -alternating if on any root-to-leaf path in $G_{M, x}$ , it alternates between $\exists$ and $\forall$ state label quantifiers at most $i - 1$ times.

Under the above definition, it is clear that $L \in Σ_{2}^{p}$ is decidable in polynomial time on an ATM which is $2$ -alternating. That is, the ATM begins only with $\exists$ states, then at some point switches to $\forall$ states, until the computation halts. We can now define alternating time and space.

Definition. For function $T$ , we say that $L \in ATIME (T)$ if there exists an ATM which decides $L$ in at most $O (T)$ time. Similarly, for function $S$ , we say that $L \in ASPACE (S)$ if there exists an ATM which decides $L$ in at most $O (S)$ space.

Lecture 13

In-class notes: CS 505 Spring 2025 Lecture 13

In this lecture, we wrap up our discussion of Alternating Turing machines and the Polynomial Hierarchy.

Alternating Time and Space

Given ATMs from last lecture, we can alternatively define $Σ_{i}^{p}$ , $Π_{i}^{p}$ , and $PH$ in terms of ATMs.

Definition. The class $Σ_{i} TIME (T)$ (resp., $Π_{i} TIME (T)$ ) is the set of all languages $L$ that are decidable in time $T$ on an $i$ -alternating ATM $M$ with initial state labeled $\exists$ (resp., $\forall$ ).

It is not difficult to see how we can define $Σ_{i}^{p}$ and $Π_{i}^{p}$ with respect to the above complexity classes.

Lemma. For all $i \in Z^{+}$ , $Σ_{i}^{p} = c \in Z^{+} ⋃ Σ_{i} TIME (n^{c}), Π_{i}^{p} = c \in Z^{+} ⋃ Π_{i} TIME (n^{c}) .$

Corollary. $PH = ⋃_{i, c} Σ_{i} TIME (n^{c}) = ⋃_{i, c} Π_{i} TIME (n^{c})$ .

Unlimited Number of Alternations

The definitions above of $Σ_{i} TIME (T)$ and $Π_{i} TIME (T)$ limit the number of alternations a given ATM can have (they are $i$ -alternating). However, we do not need to restrict the number of alternations. This was exactly the class $ATIME (T)$ defined in the last lecture: the set of all languages $L$ decidable in time $T$ by an ATM $M$ . In particular, there is no restriction on the number of alternations. With this, we can define alternating polynomial time.

Definition. $AP = ⋃_{c \in Z^{+}} ATIME (n^{c})$ .

As one might expect, an unlimited number of alternations yields something we believe to be more powerful than $PH$ .

Theorem. $AP = PSPACE$ .

Proof. First, we show that $PSPACE \subseteq AP$ . We do this by showing $TQBF \in AP$ . Recall that a QBF $ϕ$ is in $TQBF$ if and only if $ϕ = Q_{1} x_{1} Q_{2} x_{2} \dots Q_{m} x_{m} ψ (x_{1}, \dots, x_{m}) = 1$ (i.e., it is a true quantified Boolean formula), where $Q_{i} \in {\exists, \forall}$ for all $i$ . This is trivially sovlable by an ATM which guesses everything and labels states according to $Q_{i}$ , and checks if the result is true in polynomial time.

Now, we show $AP \subseteq PSPACE$ . Let $L \in AP$ with ATM $M_{L}$ deciding $L$ in polynomial time. Construct a deterministic machine $D$ as follows. On input $x$ , $D (x)$ performs a depth-first search of the configuration graph $G_{M_{L}, x}$ and attempts to compute the label of the starting node $c_{s t a r t}$ (which would be the output of the machine $M_{L} (x)$ . The algorithm is recursive. Recalling the facts about the configuration graph, since $M_{L} (x)$ runs in polynomial time, the amount of bits needed to represent every configuration is polynomial in $∣ x ∣$ . Moreover, doing a depth-first search only requires storing $poly (∣ x ∣)$ configurations in the recursion stack, since $M_{L} (x)$ runs in $poly (∣ x ∣)$ time. So $D (x)$ uses at most polynomial space. $□$

We can also define alternating polynomial space, where we consider space-bounded ATMs with an unlimited number of alternations (the class $ASPACE (T)$ .

Definition. $APSPACE = ⋃_{c \in Z^{+}} ASPACE (n^{c})$ .

Theorem. $APSPACE = EXP$ .

Finally, we can define alternating logspace as $AL = ASPACE (lo g (n))$ .

Theorem. $AL = P$ .

Time-Space Tradeoffs for SAT

What are alternations used for? Here, we’ll see how we can use ATMs to give time-space tradeoffs for $SAT$ .

Complexity theorists generally believe that any algorithm deciding/solving $SAT$ must have the following properties.

Solving $SAT$ requires exponential time (or, at least super polynomal time).
Solving $SAT$ requires linear space.

In general, both of the above are conjectures, there may be an algorithm for solving $SAT$ which only uses logarithmic space, or uses polynomial time! However, we can rule out an algorithm that achieves both of these properties.

Theorem. Let $S, T : N \to N$ be functions. Define the class $TISP (T, S)$ to be all languages decidable by a DTM $M$ using at most $O (T)$ time and $O (S)$ (additional) space. Then, $SAT \in / TISP (n^{1.2}, n^{0.2})$ . More generally, for any $c, d > 0$ such that $c (c + d) < 2$ , we have $SAT \in / TISP (n^{c}, n^{d})$ .

Proof. To prove the theorem, we first need the following claim which relates $TISP$ to $Σ_{2} TIME$ .

Claim 1. $TIPS (n^{12}, n^{2}) \subseteq Σ_{2} TIME (n^{8})$ .

The proof of this claim is similar to the proofs of Savitch’s Theorem and $TQBF$ is $PSPACE$ complete. Let $L \in TISP (n^{12}, n^{2})$ with decider $M_{L}$ running in time $O (n^{12})$ and space $O (n^{2})$ for any input $x \in {0, 1}^{n}$ . We construct an ATM $A_{L}$ for deciding $L$ as follows. $A_{L} (x)$ will construct the configuration graph $G_{M_{L}, x}$ . By the previous properties we have discussed for configuration graphs, we know that all configurations need at most $O (n^{2})$ bits to describe. Moreover, $M_{L} (x) = 1$ if and only if there exists a path in the graph from $c_{s t a r t}$ to some accepting configuration $c_{a cce pt}$ , and any such path has length at most $O (n^{12})$ .

Now, this path from $c_{s t a r t}$ to $c_{a cce pt}$ exists if and only if there exist $n^{6}$ configurations $C_{1}, C_{2}, \dots, C_{n^{6}}$ such that:

If $C_{0} = c_{s t a r t}$ , then $C_{n^{6}}$ is accepting; and
$\forall i \in [n^{6}]$ , $C_{i}$ follows from $C_{i - 1}$ in at most $O (n^{6})$ valid executions of $M_{L}$ ’s transition function. Essentially, we have divided the path from $c_{s t a r t}$ to $c_{a cce pt}$ into $n^{6}$ chunks, each of size $O (n^{6})$ . Now, each configuration $C_{i}$ requires $O (n^{2})$ bits to describe, so the full description of the path $(C_{0}, C_{1}, \dots, C_{n^{6}})$ requires $O (n^{8})$ bits. So the ATM $A_{L}$ guesses this path using the appropriate alternations. Intuitively, this is $\exists C_{0}, C_{1}, \dots, C_{n^{6}} \forall i \in [n^{6}]$ $C_{0}$ is a start state, $C_{n^{6}}$ is an accepting state, and $C_{i}$ follows from $C_{i - 1}$ in at most $O (n^{6})$ steps. Clearly, this implies $L \in Σ_{2} TIME (n^{8})$ .

Now, we need a second claim to proceed with the proof.

Claim 2. Suppose that $NTIME (n) \subseteq DTIME (n^{1.2})$ . Then $Σ_{2} TIME (n^{8}) \subseteq NTIME (n^{9.6})$ .

To see this claim, let $L \in Σ_{2} TIME (n^{8})$ . Then there exists a DTM $M_{L}$ such that $\exists w_{1} \forall w_{2} M_{L} (x, w_{1}, w_{2}) = 1$ in $O (∣ x ∣^{8})$ time, where $∣ w_{1} ∣, ∣ w_{2} ∣ = O (∣ x ∣^{8})$ . Here, we are using the fact that $Σ_{2}^{p} = \cup_{c} Σ_{2} TIME (n^{c})$ .

By the assumption in the claim, we have $NTIME (n) \subseteq DTIME (n^{1.2})$ . By a padding argument, this implies that $NTIME (n^{8}) \subseteq DTIME ((n^{8})^{1.2}) = DTIME (n^{9.6})$ . Thus, we can construct a DTM $D$ such that on input $x, w_{1}$ for $∣ x ∣ = n$ and $w_{1} = O (n^{8})$ , $D (x, w_{1})$ runs in time $O (n^{9.6})$ and outputs $1$ if and only if $\exists w_{2}$ such that $M_{L} (x, w_{1}, w_{2}) = 0$ . This implies that $L \in NTIME (n^{9.6}) .$

Why have we bothered with Claims 1 and 2? Well, using these claims, we will show that $NTIME (n) \neq \subseteq TISP (n^{1.2}, n^{0.2})$ . This establishes the result since $SAT \in NTIME (n)$ . Suppose this is not the case; that is, $NTIME (n) \subseteq TISP (n^{1.2}, n^{0.2})$ . Then, Claims 1 and 2 gives us the following inequalities. $NTIME (n^{10}) \subseteq TISP (n^{12}, n^{2}) \subseteq Σ_{2} TIME (n^{8}) (Claim 1) \subseteq NTIME (n^{9.6}) (*),$ where $(*)$ follows by Claim 2 and the fact that $TISP (T, S) \subset DTIME (T)$ . This violates the non-deterministic time hierarchy theorem, so we can conclude that $NTIME (n) \neq \subseteq TISP (n^{1.2}, n^{0.2})$ . $□$

The Polynomial Hierarchy via Oracle Machines

Our final discussion on the polynomial hierarchy will again show that we can give yet another equivalent definition, this time using oracle Turing machines/complexity classes. We begin with $Σ_{i}^{p}$ .

Theorem. For all $i \geq 2$ , we have $Σ_{i}^{p} = NP^{Σ_{i - 1} SAT}$ , where $Σ_{i - 1} SAT = {ϕ : ϕ = \exists x_{1} \forall x_{2} \dots Q_{i - 1} x_{i - 1} ψ (x_{1}, \dots, x_{i - 1} = 1}$ is a $Σ_{i - 1}^{p}$ -complete problem.

Proof. We show the proof for $i = 2$ , other cases are analogous. For $i = 2$ , we show that $Σ_{2}^{p} = NP^{SAT}$ . First, we show $Σ_{2}^{p} \subseteq NP^{SAT}$ . Let $L \in Σ_{2}^{p}$ . By definition, there exists a DTM $M_{L}$ such that for all $x$ , we have $x \in L$ if and only if $\exists w_{1} \forall w_{2} M_{L} (x, w_{1}, w_{2}) = 1$ , where $∣ w_{1} ∣, ∣ w_{2} ∣ = poly (∣ x ∣)$ . Fix $x, w_{1}$ such that $\forall w_{2} M_{L} (x, w_{1}, w_{2}) = 1$ . Now, the language $L^{'}$ of all pairs $x, w_{1}$ satisfying this is a $coNP$ language. To check if $(x, w_{1}) \in L^{'}$ , we can equivalently check the $NP$ statement $(x, w_{1}) \in / L^{'}$ . That is, we can check if $\exists w_{2} \neg M_{L} (x, w_{1}, w_{2}) = 1$ . Since this is a $NP$ statement, we can check this statement in polynomial time by converting it to a $SAT$ formula $ϕ$ and querying the $SAT$ oracle to see if it has a satisfying assignment. Thus $L \in NP^{SAT}$ and $Σ_{2}^{p} \subseteq NP^{SAT}$ .

Now, we show the other direction: $NP^{SAT} \subseteq Σ_{2}^{p}$ . Let $L \in NP^{SAT}$ with corresponding NTM $N$ . Here, we assume that the NTM $N$ outputs at most $2$ choices per execution of its transition function. At first glance, it does not seem like we can capture the power of the $SAT$ oracle in $Σ_{2}^{p}$ . After all, for any $x$ , $N (x)$ makes at most $poly (∣ x ∣)$ queries, say $q_{1}, \dots, q_{k}$ (note: each $q_{i}$ is a $SAT$ formula), each with corresponding answers $a_{1}, \dots, a_{k}$ . Moreover, each query $q_{i}$ can arbitrarily depend on the previous queires and answers, along with any other non-deterministic decisions made by $N$ !

Let $c_{1}, \dots, c_{m} \in {0, 1}$ denote the non-deterministic choices made by $N (x)$ . Intuitively, in order to construct a DTM $D$ to decide $L$ in $Σ_{2}^{p}$ , we will guess the non-deterministic choices and query answers that will cause $N (x)$ to accept. That is, we can see that $x \in L$ if and only if $\exists$ choices $c_{1}, \dots, c_{m}$ and query answers $a_{1}, \dots, a_{k}$ such that

If $N (x)$ makes choices $c_{1}, \dots, c_{m}$ and recieves oracle query answers $a_{1}, \dots, a_{k}$ ,
Then
1. $N (x)$ reaches an accepting state and
2. The following hold.
  - If $a_{i} = 1$ then there exists an assignment $u_{i}$ such that $q_{i} (u_{i}) = 1$ .
  - If $a_{i} = 0$ then for all assignments $v_{i}$ we have $q_{i} (v_{i}) = 0$ . Here, $q_{i}$ is the $i$ th query made by $N (x)$ .

Thus, we construct a DTM $D$ such that $x \in L$ if and only if $\exists c_{1}, \dots, c_{m}, a_{1}, \dots, a_{k}, u_{1}, \dots, u_{k} \forall v_{1}, \dots, v_{k} D (x, c_{1}, \dots, c_{m}, a_{1}, \dots, a_{k}, u_{1}, \dots, u_{k}, v_{1}, \dots, v_{k}) = 1 ⟺$ $D (x, \dots) = 1$ simulates $N (x)$ and

$N (x) = 1$ using non-deterministic choices $c_{1}, \dots, c_{m}$ ;
$\forall i \in [k]$ :
- If $a_{i} = 1$ then $q_{i} (u_{i}) = 1$ ; and
- If $a_{i} = 0$ then $q_{i} (v_{i}) = 0$ .

Clearly, $D$ decides $L$ and thus $L \in Σ_{2}^{p}$ . $□$

Lecture 14

In-class notes: CS 505 Spring 2025 Lecture 14

Randomized Computations

So far, we have only examined deterministic and non-deterministic Turing machines. Crucially, neither of these models for Turing machines utilize randomness. This is obvious for deterministic Turing machines, but it is very important to understand that non-determinism is not randomness, it is an idealized computation model that is not realistic.

Today, we’ll consider probabilistic Turing machines. These Turing machines will be allowed to sample uniformly and independently random bits $b \leftarrow $ {0, 1}$ and use these bits to help make decisions. Here, uniformly random means that $Pr [b = 0] = Pr [b = 1] = 1/2$ , and independent means that the probability $b$ is $0$ or $1$ does not depend on any previous bits sampled or decision made by the algorithm.

Definition. A probabilistic Turing machine (PTM) is a Turing machine with two transition functions $(δ_{0}, δ_{1})$ . For all $x \in {0, 1}^{*}$ , a PTM $M$ on input $x$ does the following.

For each step of the computation, $M (x)$ samples $b \leftarrow $ {0, 1}$ .
Execute $δ_{b}$ .

We let $M (x)$ denote the random variable corresponding to PTM $M$ ’s output on input $x$ given the random choices it makes during its execution. We say that $M$ runs in time $T$ if for all $x$ , $M (x)$ halts in $O (T (∣ x ∣))$ steps for any set of random choices.

Deciding Languages: PTMs vs NTMs

As stated above, PTMs and NTMs are not the same. Recall that for NP, we say that an NTM $N$ accepts a string $x$ if there exists an execution of $N (x)$ such that $N (x) = 1$ . Similarly, for coNP, the NTM $N$ accepts string $x$ if all executions of $N (x)$ satisfy $N (x) = 1$ . Above, these definitions are quantified with respect to any possible non-deterministic choices the NTM $N$ make during its execution.

Now, for a PTM, say $M$ , we can similarly define it with respect to the number of accepting paths. That is, we can say that $M$ accepts a string $x$ if some fraction of all executions $M (x)$ satisfy $M (x) = 1$ . For example, we can specify that if $M (x) = 1$ for at least half of all computation paths (that is, looking at every possible set of computation paths $M$ could take using randomness), then we say $M$ accepts $x$ . This gives us a natural definition for deciding a language with a PTM.

Definition. Let $L$ be a language and $T : N \to N$ be a function. We say that a PTM $M$ decides $L$ in time $T$ if $M (x)$ halts in time $O (T (∣ x ∣))$ for any $x$ and $Pr [M (x) = L (x)] \geq 2/3,$ where the probability is taken over the random choices made by $M$ on input $x$ , and $L (x) = 1$ if $x \in L$ and $L (x) = 0$ if $x \in / L$ .

In the above definition, decidability with respect to a PTM has two-sided error, meaning that $M (x)$ outputs correctly with probability at least $2/3$ for both $x \in L$ and $x \in / L$ . This leads us to our first probabilistic complexity class: BPP.

Definition. For a function $T : N \to N$ , we say that a language $L \in BPTIME (T)$ if there exists a PTM $M_{L}$ which decides $L$ in time $T$ . The complexity class BPP is defined as $BPP = c \in N ⋃ BPTIME (n^{c}) .$

Note that BPP is still a worst-case class since we require deciding languages on a PTM in strict polynomial time.

BPP vs. Other Classes

Intuitively, like how NP was the non-deterministic analogue of P, BPP is the randomized analogue of P. In fact, we know that $P \subseteq BPP \subseteq EXP$ . To see the first inequality, notice that every deterministic algorithm is a randomized algorithm that uses no randomness (e.g., you just pick $δ_{0} = δ_{1} = δ$ , where $δ$ is the transition function of the DTM). For the second inequality, suppose that $M$ is a PTM running in time $O (n^{c})$ . Then $M$ has $2^{O (n^{c})}$ possible computation paths (each step picks 0/1 with equal probability). So we can construct a machine $M^{'}$ running in time $O (2^{n^{c}})$ which enumerates all possible computation paths of $M$ and outputs $1$ if and only if at least 2/3 of the paths are accepting.

Now, it is unknown whether $BPP ⊊ NEXP$ or if $BPP = P$ . For the second statement, most complexity theorists actually believe that $BPP = P$ , but this is a topic we will not cover in this course.

This is not the last time we will see how BPP relates to other classes; we will return to this topic in later lectures.

Alternate Definition of BPP

Taking inspiration from the above outline of $BPP \subseteq EXP$ , we can give an alternative definition of BPP similar to the certificate definition of NP.

Definition. A language $L$ is in the class BPP if there exists a polynomial-time deterministic Turing machine $D$ and a polynomial $p$ such that for all $x$ , $r \leftarrow $ {0, 1}^{p (∣ x ∣)} Pr [D (x, r) = L (x)] \geq 2/3.$

3 Examples of BPP Algorithms

We’ll now give 3 examples of randomized algorithms.

Finding the Median

Given a set of $n$ numbers $S = {a_{1}, \dots, a_{n}}$ , the median of $S$ is the number $x$ such that $x \geq s$ for $⌊ n /2 ⌋$ elements $s \in S$ and $s \leq t$ for $⌊ n /2 ⌋$ elements $t \in S$ . Stated as a decision problem, you’d be given $(S, m)$ and would need to decide if $m$ is the median of $S$ .

There is an easy deterministic algorithm that requires $O (n lo g (n))$ time to check.

Sort $S$ and obtain $S = (s_{1}, \dots, s_{n})$ , where $s_{i} \leq s_{i + 1}$ for all $i$ .
If $n$ is even, set $x = (s_{n /2} + s_{1 + n /2}) /2$ (more generally, any number in the range $[s_{n /2}, s_{1 + n /2}]$ ). If $n$ is odd, set $x = s_{⌈ n /2 ⌉}$ .

This takes $O (n lo g (n))$ time since we sort $S$ .

Now there is a $O (n)$ time deterministic algorithm for finding the median and, more generally, the $k$ th smallest element. However, the algorithm is highly non-trivial.

To contrast, we will give an $O (n)$ time simple randomized algorithm to find the $k$ th smallest element, where an element $t \in S$ is the $k$ th smallest if $k - 1$ $s \in S$ satisfy $s \leq t$ and $n - k - 1$ $s^{'} \in S$ satisfy $t \leq s^{'}$ . In particular, setting $k = ⌊ n /2 ⌋$ gives us the median problem. Let $S = {a_{1}, \dots, a_{n}}$ .

Find $k$ thElement $(k, S)$ :
1. Pick $i \leftarrow $ [∣ S ∣]$ . Set $x = a_{i}$ .
2. Scan $S$ and let $m$ denote the number of elements $s \in S$ such that $s \leq x$ .
3. If $m = k$ , output $x$ .
4. If $m > k$ :
  - Create set $S^{'} = {c \in S ∣ c \leq x}$ .
  - Run Find $k$ thElement $(k, S^{'})$ .
5. If $m < k$ :
  - Let $T = {c \in S ∣ c \geq z}$ .
  - Run Find $k$ thElement $(k - m, T)$ .

We now give the intuition on why this algorithm runs in $O (n)$ time for initial list size $n$ .

The deterministic parts of the algorithm run in linear time.
The expected sizes of $S^{'}$ and $T$ can be shown to be at most $9 n /10$ . That is, with good/high probability over the random choice of $i$ , $∣ S^{'} ∣ \leq 9 n /10$ and $∣ T ∣ \leq 9 n /10$ (where the “ $n$ ” is updated every recursive call to mean $∣ S ∣$ ).
Together, this implies that if $T (n)$ is the runtime of the algorithm, then $T (n) = O (n) + T (9 n /10)$ , which can be shown to imply that $T (n) = O (n)$ .

Polynomial Identity Testing

Polynomial identity testing is very common in many areas of theoretical computer science and cryptography. Consider an $n$ variate polynomial with integer coefficients $p (X_{1}, \dots, X_{n}) \in Z [X_{1}, \dots, X_{n}]$ We define the degree of $p$ to be $deg (p) = \sum_{i = 1}^{n} deg (X_{i}, p)$ , where $deg (X_{i}, p)$ is the largest degree of $X_{i}$ in $p$ . For example, if $p (X_{1}, X_{2}, X_{3}) = X_{1}^{100} + X_{2}^{2} X_{3}^{50} + X_{1}^{3} X_{2}^{5} X_{3}^{7}$ , then $deg (p) = 100 + 5 + 50 = 155$ .

The most natural identity to test is whether $p$ is the identically zero polynomial. That is, $p (y_{1}, \dots, y_{n}) = 0$ for all $y_{1}, \dots, y_{n} \in Z$ . Note that this isn’t even checkable in polynomial time. One can give a $p$ with a small description but, once expanded, contain $O (2^{n})$ terms! An example of one such polynomial is $p (X_{1}, \dots, X_{n}) = \prod_{i = 1}^{n} (c_{i} - X_{i})$ for some constants $c_{i} \in Z$ .

However, there is a probabilistic algorithm which can efficiently check if $p$ is identically zero, assuming that evaluating $p$ at any single point is efficient (i.e., polynomial-time).

$ZeroCheck (p)$ :

Sample $k \leftarrow $ [2^{2 n}]$ .
Sample uniformly random $α_{1}, \dots, α_{n} \leftarrow $ [10 \cdot 2^{n}]$ .
Check if $p (α_{1}, \dots, α_{n}) mod k = 0$ . Output $1$ if this check passes, and $0$ otherwise.

Observations. Let $y = p (α_{1}, \dots, α_{n})$ .

If $y = 0$ then $y mod k = 0$ for any $k$ .
Suppose that $y \neq = 0$ . We want to analyze the probability that $Pr_{k} [y mod k = 0∣ y \neq = 0]$ . This is equal to the probability that $k$ divides $y$ . Suppose that $y$ has prime factors $p_{1}, \dots, p_{ℓ}$ . We can upper bound the probability that $k$ divides $y$ by the probability that $k$ is any one of these prime numbers. By the Prime Number Theorem, the number of primes in $[2^{2 n}]$ is at least $2^{2 n} /2 n$ . Now, one can show that $lo g (y) \leq 5 n \cdot 2^{n} = o (2^{2 n} /2 n)$ . This implies that the number of elements of $[2^{2 n}]$ not equal to $p_{1}, \dots, p_{ℓ}$ is at least $2^{2 n} /4 n$ . So, with probability at least $δ = 1/4 n$ , the randomly chosen $k$ will not be equal to any of $p_{1}, \dots, p_{ℓ}$ . This gives us $k Pr [y mod k = 0∣ y \neq = 0] \leq (1 - 1/4 n) .$
Now suppose that $y = 0$ but $p$ is not the identically zero polynomial. This means that $α_{1}, \dots, α_{n}$ is a root of the polynomial $p$ . By the Schwartz-Zippel lemma, this implies that $α_{1}, \dots, α_{n} Pr [y = 0∣ p \neq = 0] \leq \frac{deg ( p )}{∣ [ 10 \cdot 2 ^{n} ] ∣} .$

All together, these observations imply that $Pr [ZeroCheck (p) = 1∣ p \neq = 0] \leq k Pr [y mod k = 0∣ y \neq = 0] + α_{1}, \dots, α_{n} Pr [y = 0∣ p \neq = 0] \leq (1 - \frac{1}{4 n}) + \frac{deg ( p )}{10 \cdot 2 ^{n}} .$

Verifying Matrix Multiplications

Let $p \in Z^{+}$ be a prime number and let $Z_{p}$ be the set of integers modulo $p$ . Fix three $n \times n$ matrices $A, B, C \in Z_{p}^{n \times n}$ . We want to decide if $C = A \cdot B$ .

The fastest known matrix multiplication algorithm runs in time $O (n^{2.37286})$ (and, I believe, is not possible to be run on a real computer because the hidden constant is enormous). There is also the trivial algorithm which takes $O (n^{3})$ time; here, we’re actually counting the number of operations, so this algorithm takes roughly $O (n^{3})$ multiplications and additions over $Z_{p}$ .

We can use randomness to get an $O (n^{2})$ -time algorithm to verify if $C = A \cdot B$ . The algorithm operates as follows.

Sample $r \leftarrow $ Z_{p}$ .
Create vector $x = (1, r, r^{2}, \dots, r^{n - 1})^{⊤} \in Z_{p}^{n \times 1}$ .
Check if $C x = A (B x)$ . Output $1$ if and only if this check passes.

Notice that computing $C x$ takes $O (n^{2})$ time. Similarly, computing $y = B x$ takes $O (n^{2})$ time, and $z = A y$ also takes $O (n^{2})$ time, so the algorithm runs in $O (n^{2})$ time.

Now, if $C = A B$ , then $C x = A (B x)$ for any choice of $x$ , and thus with probability $1$ if $C = A B$ then $C x = A (B x)$ . What if $C \neq = A B$ ? This basically reduces to $n$ polynomial equality checks (or, equivalently, $n$ zero-checks). Let $C_{i}$ be the $i$ th row of $C$ . Then, if we set $c = C x$ , we have that $c_{i} = ⟨ C_{i}, x ⟩$ . Recall that $x = (1, r, r^{2}, \dots, r^{n - 1})$ , so this inner product looks like $c_{i} = \sum_{j = 1}^{n} C_{i, j} r^{j - 1}$ , which looks exactly like evaluating a degree $n - 1$ polynomial, say $C_{i} (X) = \sum_{j = 1}^{n} C_{i, j} X^{j - 1}$ , at a random point $r$ .

Now, if we set $M = A \cdot B$ , suppose that in row $i$ , we have $C_{i} \neq = M_{i}$ . Then, our random check $r$ is actually testing the following: $r Pr [C_{i} (r) - M_{i} (r) = 0∣ C_{i} \neq = M_{i}] \leq \frac{n - 1}{p} .$ In particular, the above is true if $C_{i}$ and $M_{i}$ differ on a single entry! So this shows that $r Pr [C x \neq = A (B x) ∣ C \neq = A B] \geq 1 - \frac{n - 1}{p} .$

One-Sided Error

Interestingly, the above algorithms all have stronger guarantees than just BPP gives. Finding the median always outputs the correct answer, but runs in expected linear time. Both polynomial identity testing and verifying matrix multiplications will answer “YES” with probability $1$ when the polynomial given as input is identically zero and when $A B = C$ for the given matrices, but both of these algorithms will output “YES” with small probability (not equal to $0$ ) when the input does not have the correct form.

Building on the latter two algorithms, this lets define two “stronger” classes than BPP which have one-sided error.

Definition. The class $RTIME (T)$ is the set of all languages $L$ that are decidable by a PTM $M$ running in time $T$ such that for all $x$ $x \in L x \in / L ⟹ Pr [M (x) = 1] \geq 2/3; ⟹ Pr [M (x) = 0] = 1.$ We define the class RP as $RP = c \in N ⋃ RTIME (n^{c}) .$

Naturally, we have the co-class of RP, called coRP defined as $coRP = {L : \overline{L} \in RP}$ . Equivalently, it is also a one-sided error class with the opposite guarantees as RP.

Definition. The class $coRTIME (T)$ is the set of all languages $L$ that are decidable by a PTM $M$ running in time $T$ such that for all $x$ $x \in L x \in / L ⟹ Pr [M (x) = 1] = 1; ⟹ Pr [M (x) = 0] \geq 2/3.$ We define the class coRP as $coRP = c \in N ⋃ coRTIME (n^{c}) .$

Lecture 15

In-class notes: CS 505 Spring 2025 Lecture 15

Zero-Sided Error

Last time, we discussed BPP, RP, and coRP, which are 3 (worst-case) probabilistic complexity classes.

BPP is the set of all languages $L$ decidable in strict polynomial-time by a PTM $M$ such that $Pr [M (x) = L (x)] \geq 2/3$ . BPP is a two-sided error class.
RP is the set of all languages $L$ decidable in (again) strict polynomial-time by a PTM $M$ such that $x \in L$ implies $Pr [M (x) = 1] \geq 2/3$ and $x \in / L$ implies $Pr [M (x) = 0] = 1$ . RP is a one-sided error class, where it never has false positives (i.e., outputs $1$ when the answer is $0$ ).
coRP is the set of all languages $L$ decidable in (again) strict polynomial-time by a PTM $M$ such that $x \in L$ implies $Pr [M (x) = 1] = 1$ and $x \in / L$ implies $Pr [M (x) = 0] \geq 2/3$ . coRP is a one-sided error class, where it never has false negatives (i.e., outputs $0$ when the answer is $1$ ).

Now we turn to zero-sided error. Intuitively, zero-sided means that a PTM always outputs correctly; that is, $Pr [M (x) = L (x)] = 1$ . You would be correct in thinking that if this happens in strict polynomial-time, then this class would just be P. So, in order to not have the same class, the probabilistic class of languages decidable with zero-sided error is relaxed to have PTMs that run in expected polynomial time. This is the class ZPP.

Definition. The class $ZTIME (T)$ is the set of all languages $L$ decidable on a PTM $M$ running in expected time $O (T)$ such that $Pr [M (x) = L (x)] = 1$ for all $x$ . In particular, if $T_{M, x}$ is the random variable for the runtime of $M$ on input $x$ , then $L \in ZTIME (T)$ if and only if $E [T_{M, x}] = O (T (∣ x ∣))$ and $Pr [M (x) = L (x)] = 1$ for all $x$ . The class ZPP is the set of all languages $L$ decidable in expected polynomial-time with zero-sided error; i.e., $ZPP = c \in N ⋃ ZTIME (n^{c}) .$

Note

$E [T_{M, x}] = i = 1 \sum \infty i \cdot Pr [T_{M, x} = i] .$

ZPP vs RP and coRP

Since we have deviated from strict polynomial time to expected polynomial time, one may wonder how ZPP relates to RP and coRP. The following theorem exactly captures this relationship.

Theorem. $ZPP = RP \cap coRP .$

Proof. We show both directions. First, we show that $RP \cap coRP \subseteq ZPP$ . Let $L \in RP \cap coRP$ . Let $A_{L}$ be the RP machine and $B_{L}$ be the coRP machine. In particular, the following hold. $x \in L x \in / L ⟹ Pr [A_{L} (x) = 1] \geq 2/3 Pr [B_{L} (x) = 1] = 1 ⟹ Pr [A_{L} (x) = 0] = 1 Pr [B_{L} (x) = 0] \geq 2/3$

We construct a new PTM $D$ which will decide $L$ with zero-sided error in expected polynomial time. $D$ on input $x$ does the following.

$D (x) :$
- While true:
  - Run $A_{L} (x)$ and $B_{L} (x)$ to completion.
  - If they both output $1$ (accept), then output $1$ (accept).
  - If they both output $0$ (reject), then output $0$ (reject).

First, we show that (eventually) if $D (x)$ halts then $D (x) = L (x)$ . Notice that $D (x)$ will always halt. This can be seen as follows.

For $x \in L$ ,
- $Pr [B_{L} (x) = 1] = 1$ . In particular, we will never have $B_{L} (x) = 0$ in this case. $D (x)$ will run until $A_{L} (x) = 1$ , which happens with probability at least $2/3$ . In this case $D (x) = 1$ . Since the probability $A_{L} (x) = 1$ is not zero, there is a series of random choices that $A_{L} (x)$ can make that will make it output $1$ , so $D (x)$ will halt and output $1$ .
If $x \in / L$ ,
- $Pr [A_{L} (x) = 0] = 1$ . So we have the reverse of above: $A_{L} (x) = 1$ will never happen. So $D (x)$ can only output $0$ in this case, and it will do so eventually since $Pr [B_{L} (x) = 0] \geq 2/3$ in this case.

Now, we argue that $D (x)$ runs in expected polynomial time. Let $p$ be a polynomial such that $A_{L}$ and $B_{L}$ both run in time at most $p (n)$ for inputs of length $n$ . We analyze the expected running time of $D$ on input $x$ . Let $T_{D, x}$ denote the random variable for the runtime of $D$ on input $x$ . First, notice that for every iteration of the loop, $D$ runs in time at most $2 \cdot p (n)$ to run both $A_{L} (x)$ and $B_{L} (x)$ to completion, assuming that $∣ x ∣ = n$ . So if we are in the $i$ th iteration of the loop, at the end of the loop, $D (x)$ will have run for $2 i \cdot p (n)$ steps.¹

So to analyze the expected runtime of $D$ , we have $E [T_{D, x}] = i = 1 \sum \infty 2 i \cdot p (n) \cdot Pr [T_{D, x} = 2 i \cdot p (n)] = 2 p (n) \cdot i = 1 \sum \infty i \cdot Pr [T_{D, x} = 2 i \cdot p (n)] .$ Now, we analyze $Pr [T_{D, x} = 2 i \cdot p (n)]$ . This probability is identical to the probability that $D (x)$ halts after the $i$ th loop. We analyze this probability, starting with $i = 1$ .

$i = 1$ . In this case, $D (x)$ halts after one execution of $A_{L} (x)$ and $B_{L} (x)$ . This means after one execution, they are in agreement. For both $x \in L$ and $x \in / L$ , this happens with probability at least $2/3$ . Without loss of generality, through the remainder of the proof we assume that the probability this happens is exactly $2/3$ .
$i = 2$ . In this case, the first execution of the loop resulted in $A_{L} (x) \neq = B_{L} (x)$ , and the second execution has $A_{L} (x) = B_{L} (x)$ . Implicitly, we have assumed that $D (x)$ does not reuse randomness in every subsequent execution of the loop, so each run of $A_{L} (x)$ and $B_{L} (x)$ are independent of previous runs. Now, the probability that $A_{L} (x) \neq = B_{L} (x)$ for any $x$ is (at most) $1/3$ . So in this case, the probability that $D (x)$ halts after $i = 2$ loops is equal to $(1/3) \cdot (2/3) = 2/9$ .

Extending the above analysis to any $i$ gives us $Pr [T_{D, x} = 2 i \cdot p (n)] = (\frac{1}{3})^{i - 1} \cdot \frac{2}{3} = 2 \cdot (\frac{1}{3})^{i} .$ This then tells us

$E [T_{D, x}] = 2 p (n) \cdot i = 1 \sum \infty i \cdot Pr [T_{D, x} = 2 i \cdot p (n)] = 4 p (n) \cdot i = 1 \sum \infty i \cdot (\frac{1}{3})^{i} = 4 p (n) \cdot (\frac{3}{4}) = 3 p (n),$ where the last equality can be shown using infinite sum tricks. So, we have shown that $D (x)$ runs in expected polynomial time $3 p (n)$ . Thus, $L \in ZPP$ .

Now for the other (easier) direction, we show that $ZPP \subseteq RP \cap coRP$ . For this, we will need a result known as Markov’s Inequality.

Markov’s Inequality states that if you have a non-negative random variable $X$ , then for any $a \geq 0$ , it holds that $Pr [X \geq a] \leq \frac{E [ X ]}{a} .$ We’ll use this inequality to show $L \in RP \cap coRP$ .

Let $D_{L}$ be the ZPP PTM that decides $L$ . Suppose that $D_{L}$ decides $L$ in expected polynomial time $q (n)$ for inputs of length $n$ . In particular, $x \in L$ if and only if $D (x) = 1$ , and $D (x)$ runs in expected time $q (∣ x ∣)$ for any $x$ .

We construct a new PTM $A$ which does the following.

$A (x) :$
- Compute $T = 3 \cdot q (∣ x ∣)$ .
- Run $D_{L} (x)$ for at most $T$ steps.
  - If $D_{L} (x)$ halts within $T$ steps, output whateer $D_{L} (x)$ outputs.
- Output $0$ .

We show that $A$ is the RP machine deciding $L$ . First, we show that if $x \in / L$ , then $Pr [A (x) = 0] = 1$ . Notice that if $D_{L} (x)$ halts within $T$ steps, then $D_{L} (x) = 0$ by definition of ZPP. Then, if $D_{L} (x)$ does not halt within $T$ steps, the machine $A$ otuputs $0$ . So in either case, $A (x) = 0$ and thus $Pr [A (x) = 0] = 1$ .

Now assume that $x \in L$ . We need to show that $Pr [A (x) = 1] \geq 2/3$ . By definition of $A$ , $A (x) = 1$ if and only if $D_{L} (x) = 1$ within $T$ steps. In particular, we know that $D_{L} (x) = 1$ since $x \in L$ , so we must show that $D_{L} (x)$ halts within $T$ steps with probability at least $2/3$ .

Let $T_{D_{L}, x}$ denote the random variable for the runtime of $D_{L} (x)$ . By definition, $E [T_{D_{L}, x}] = q (∣ x ∣)$ . Applying Markov’s inequality, set $a = T = 3 q (∣ x ∣)$ . Then, we have $Pr [T_{D_{L}, x} > 3 q (∣ x ∣)] \leq \frac{E [ T _{D_{L}, x} ]}{T} = \frac{q ( ∣ x ∣ )}{3 q ( ∣ x ∣ )} = \frac{1}{3} .$

This tells us $Pr [A_{L} (x) = 0] \leq \frac{1}{3} ⟹ Pr [A_{L} (x) = 1] \geq \frac{2}{3},$ showing that $L \in RP$ .

Now, to show that $L \in coRP$ , we construct PTM $B$ identically to the PTM $A$ , except the machine $B$ outputs $1$ if $D_{L} (x)$ does not halt within $T$ steps. The analysis is identical to the above analysis. Therefore, $L \in coRP$ and thus $ZPP = RP \cap coRP$ . $□$

Technically speaking, to run $A_{L} (x)$ and $B_{L} (x)$ , $D$ needs $O (p (n) \cdot lo g (p (n)))$ time for universal simulation, but we can simply upper bound this by another polynomial $q$ and the analysis remains the same. ↩

Lecture 16

In-class notes: CS 505 Spring 2025 Lecture 16

Error Reduction for BPP

Recall that a language $L$ is in BPP if there exists a strict polynomial time probabilistic Turing machine $M$ such that for any $x$ , $Pr [M (x) = L (x)] \geq 2/3$ . Equivalently stated, for a deterministic Turing machine $M$ running in polynomial time, it holds that $Pr_{r} [M (x, r) = L (x)] \geq 2/3$ for all $x$ , where $r \in {0, 1}^{poly (∣ x ∣)}$ .

Here, the error probability $2/3$ is convenient, but arbitrary. We’ll see that the class BPP is equivalently defined for any error probability at least $1/2$ .

Definition. For any $ε \in (0, 1/2)$ , the class $BPP_{1/2 + ε}$ is the set of all languages $L$ such that there exists a probabilistic Turing machine $N$ running in strict polynomial time such that for all $x$ , $Pr [M (x) = L (x)] \geq 1/2 + ε (∣ x ∣)$ .

Under the above definition, we can set $ε = n^{- c}$ for constant $c > 0$ for $n = ∣ x ∣$ , or even $ε = 1/2 - 2^{- n^{c}}$ . We’ll show that this BPP class with reduced error is equivalent to the standard BPP definition.

Theorem. For all constants $c > 0$ , it holds that $BPP = BPP_{1/2 + n^{- c}}$ .

Proof. Clearly, $BPP \subseteq BPP_{1/2 + n^{- c}}$ since $2/3 \geq 1/2 + n^{- c}$ for all $c > 0$ and large enough $n \geq 2$ . Next, we show the other direction, namely $BPP_{1/2 + n^{- c}} \subseteq BPP$ . We show this via the following claim.

Claim. Suppose $L$ is decidable by a PTM (in polynomial time) $M$ such that for all $x$ , $Pr [M (x) = L (x)] \geq 1/2 + ∣ x ∣^{- c}$ for some constant $c > 0$ . Then there exists a PTM $M^{'}$ running in polynomial time and a constant $d > 0$ such that for all $x$ , $Pr [M^{'} (x) = L (x)] \geq 1 - 2^{- ∣ x ∣^{d}}$ .

We won’t show the full proof, but just the main ideas. The idea is that the machine $M^{'}$ will simply run $M$ for some polynomial number of times, then output the majority of the outputs. Let $k = q (∣ x ∣) = 8∣ x ∣^{2 c + d} + 1$ . Then, $M^{'} (x)$ will independently run $M (x)$ $k$ times. Let $b_{1}, \dots, b_{k} \in {0, 1}$ be the output bits of the $k$ independent runs of $M (x)$ . Then, $M^{'} (x)$ outputs $b = majority (b_{1}, \dots, b_{k})$ , where $b = 1$ if $⌈ k /2 ⌉$ of the bits $b_{1}, \dots, b_{k}$ are $1$ ; otherwise, $b = 0$ . It can be shown using the Chernoff bound that if $Pr [M (x) = L (x)] \geq 1/2 + ∣ x ∣^{- c}$ , then $Pr [M^{'} (x) = L (x)] \geq 1 - 2^{- ∣ x ∣^{d}}$ under the parameters we’ve set. Notice also that $M^{'} (x)$ runs in polynomial time since it runs $M (x)$ a polynomial number of times.

With the claim, for large enough $n = ∣ x ∣$ , it holds that $1 - 2^{- ∣ x ∣^{d}} \geq 2/3$ , which shows that $L \in BPP$ whenever $L \in BPP_{1/2 + n^{- c}}$ . $□$

BPP vs. Other Classes

Now, we examine the relationship between BPP and other complexity classes.

BPP vs P

First, we naturally ask: what is the relationship between P and BPP? Clearly, $P \subseteq BPP$ since any deterministic Turing machine is also probabilistic (you can either set $δ_{0} = δ_{1}$ , or have $M (x, r)$ ignore the random input $r$ ). Many complexity theorists actually believe that $P = BPP$ , which concerns the rich field of derandomization and hardness amplification. But we will not examine these fields in this course.

BPP vs PH

At first glance, it is not clear what the relationship between the polynomial hierarchy and BPP is. It turns out that BPP sits low in the polynomial hierarchy.

Theorem. $BPP \subseteq Σ_{2}^{p} \cap Π_{2}^{p}$ .

Proof. Note that since $BPP = coBPP$ , it suffices to show either $BPP \subseteq Σ_{2}^{p}$ or $Π_{2}^{p}$ . We show that $BPP \subseteq Σ_{2}^{p}$ .

Let $L \in BPP_{1 - 2^{- n}}$ (which is equivalent to BPP by the previous theorem and claim). Using the DTM definition of BPP, there exists a DTM $M$ running in polynomial time such that for all $x$ , it holds that $Pr_{r} [M (x, r) = L (x)] \geq 1 - 2^{- ∣ x ∣}$ , where $r \in {0, 1}^{p (∣ x ∣)}$ for some polynomial $p$ .

Let $m = p (n)$ for $n = ∣ x ∣$ . We define a set $S_{x} \subseteq {0, 1}^{m}$ as the set of all good strings $r$ for $x$ . That is, $S_{x} = {r \in {0, 1}^{m} ∣ M (x, r) = 1}$ ; i.e., the set of all $r$ such that $M (x, r) = 1$ . Otherwise, if $r \in / S_{x}$ , we say that $r$ is bad for $x$ .

Notice that if $x \in L$ , it holds that $∣ S_{x} ∣ \geq 2^{m} \cdot (1 - 2^{- n})$ . This is because $Pr_{r} [M (x, r) = 1] \geq 1 - 2^{- n}$ when $x \in L$ , and thus there must be at least $1 - 2^{- n}$ strings $r$ such that $M (x, r) = 1$ .
If $x \in / L$ , it holds that $∣ S_{x} ∣ \leq 2^{m} \cdot 2^{- n} = 2^{m - n}$ . This is because if $x \in / L$ , then $Pr_{r} [M (x, r) = 1] \leq 2^{- n}$ , so there is at most $2^{- n}$ fraction of strings $r$ such that $M (x, r) = 1$ .

Now, the goal is to encode the set $S_{x}$ as a $Σ_{2}^{p}$ statement. We’ll need the following tool. For any $S \subseteq {0, 1}^{m}$ and any vector $u \in {0, 1}^{m}$ , define $S + u := {s + u ∣ s \in S}$ . Now set $k = ⌈ m / n ⌉ + 1 \geq 2$ .

Claim 1. If $∣ S_{x} ∣ \leq 2^{m - n}$ , then for all $u_{1}, \dots, u_{k} \in {0, 1}^{m}$ , it holds that $i = 1 ⋃ k (S + u_{i}) \neq = {0, 1}^{m} .$

Proof of Claim 1. Notice that for any $u \in {0, 1}^{m}$ , we have $∣ S + u ∣ = ∣ S ∣ = 2^{m - n}$ . Then by a simple Union bound, we have $i = 1 ⋃ k (S + u_{i}) \leq i = 1 \sum k ∣ S + u_{i} ∣ = k \cdot 2^{m - n} < 2^{m} .$

Claim 2. If $∣ S ∣ \geq (1 - 2^{- n}) \cdot 2^{m}$ then there exists $u_{1}, \dots, u_{k}$ such that $i = 1 ⋃ k (S + u_{i}) = {0, 1}^{m} .$

Proof of Claim 2. We use the probabilistic method. If we can show that for uniformly and independently sampled $u_{1}, \dots, u_{k} \in {0, 1}^{m}$ , $u_{1}, \dots, u_{k} Pr [i = 1 ⋃ k (S + u_{i}) = {0, 1}^{m}] > 0,$ then there must exist vectors $u_{1}^{*}, \dots, u_{k}^{*}$ such that $i = 1 ⋃ k (S + u_{i}^{*}) = {0, 1}^{m} .$ For $r \in {0, 1}^{m}$ , let $B_{r}$ denote the “bad event” that $r \in / \cup_{i = 1}^{k} (S + u_{i})$ . We show that $Pr [\exists r \in {0, 1}^{m} B_{r}] < 1$ .

Consider $B_{r}$ for any $r \in {0, 1}^{m}$ . We show that $Pr [B_{r}] < 2^{- m}$ . Let $B_{r}^{i}$ denote the event that $r \in / S + u_{i}$ . Equivalently stated, $r + u_{i} \in / S$ . Notice that $B_{r} = \cap_{i = 1}^{k} B_{r}^{i}$ .

Now, since $u_{i}$ is uniformly sampled, we know that $r + u_{i}$ is uniformly distributed in ${0, 1}^{m}$ . So we know that $Pr [r + u_{i} \in S] \geq 1 - 2^{- n}$ , which implies that $Pr [r + u_{i} \in / S] \leq 2^{- n}$ . So $Pr [B_{r}^{i}] \leq 2^{- n}$ . Finally, all $B_{r}^{i}$ are independent, so we have $Pr [B_{r}] = Pr [B_{r}^{i}]^{k} \leq 2^{- nk} < 2^{- m}$ , where the last inequality follows since $nk = n (⌈ m / n ⌉ + 1) \geq m + n > m$ . This implies again by the Union bound that $Pr [\exists r B_{r}] < k \cdot 2^{- m} < 1$ , which implies that $u_{1}, \dots, u_{k} Pr [i = 1 ⋃ k (S + u_{i}) = {0, 1}^{m}] > 0,$ so there exists vectors $u_{1}, \dots, u_{k}$ such that $i = 1 ⋃ k (S + u_{i}^{*}) = {0, 1}^{m} .$

Now, given the two claims above, we can now decide $L \in BPP_{1/2 + 2^{- n}}$ using a $Σ_{2}^{p}$ machine as follows. For any $x$ , define the machine $N$ which operates as follows. $N (x, u_{1}, \dots, u_{k}, r)$ outputs $1$ if and only if $r \in \cup_{i = 1}^{k} (S_{x} + u_{i})$ , where $k, m = poly (∣ x ∣)$ . Therefore, $x \in L$ if and only if $\exists u_{1}, \dots, u_{k} \in {0, 1}^{m} \forall r \in {0, 1}^{m} N (x, u_{1}, \dots, u_{k}, r) = 1$ . Thus, $L \in Σ_{2}^{p}$ . $□$

Randomized Reductions

We can define a slightly weaker notion of reduction than the polynomial time reductions we’ve seen before. We’ll see randomized reductions now.

Definition. For languages $B, C$ , we say that $B$ is randomized polynomial-time reducible to $C$ , denoted by $B \leq_{r} C$ , if there exists a polynomial time probabilistic Turing machine $M$ such that for all $x$ , we have $Pr [C (M (x)) = B (x)] \geq 2/3$ .

Note that randomized reductions are not transitive! That is, if $A \leq_{r} B$ and $B \leq_{r} C$ , it is not necessarily the case that $A \leq_{r} C$ . However, randomized reductions are still useful. One can show that if $B \leq_{r} C$ and $C \in BPP$ , then $B \in BPP$ .

NP under Randomized Reductions?

We can define an NP-like class for NP under randomized reductions. This is the class $BP \cdot NP$ . $BP \cdot NP = {L ∣ L \leq_{r} 3SAT}$ Note that we can equivalently define $NP$ as $NP = {L ∣ L \leq_{p} 3SAT}$ .

Generally speaking, complexity theorists believe that $NP \neq = BP \cdot NP$ . They also do not believe that $\overline{3SAT} \in BP \cdot NP$ because of the following lemma.

Lemma. If $\overline{3SAT} \in BP \cdot NP$ , then $PH = Σ_{3}^{p}$ .

Randomized Space-bounded Computations

We can also examine space-bounded computations through the lens of probabilistic Turing machines. The most interesting space-bounded randomized computations are those which only use logarithmic space.

Definition. The class $BPL$ is the set of all languages $L$ such that there exists a strict polynomial-time PTM $M$ using $O (lo g (n))$ additional space for inputs of length at most $n$ such that $Pr [M (x) = L (x)] \geq 2/3$ .

BPL is the log-space equivalent of BPP, and we can similarly log-space equivalents of RP, coRP, and ZPP, denoted as RL, coRL, and ZPL.

Theorem.

$RL \subseteq NL \subseteq P$ .
$BPL \subseteq P$ .

Boolean Circuits

We will now turn our attention to Boolean circuits, or just circuits. Circuits are inherently a non-uniform model of computation. That is, circuits have a fixed input length, rather than being able to operate over infinitely many input lengths. For example, Turing machines are a uniform computation model, where a single Turing machine $M$ takes infinitely many inputs $x \in {0, 1}^{*}$ .

Circuits, on the other hand, can only operate over a fixed input length. For example, a circuit $C_{n}$ computes some function over inputs $x \in {0, 1}^{n}$ for a fixed $n$ , and every $n$ has some different circuit.

Definition. A Boolean circuit $C$ of size $m$ with $n$ -bit inputs is a directed acyclic graph on $m$ vertices with the following syntax.

The $n$ input vertices have in-degree $0$ and unlimited out-degree.
The remaining $m - n$ non-input nodes, which we call gates, are all labeled AND, OR, and NOT (corresponding to the Boolean functions AND, OR, NOT) and operate as follows.
- AND and OR gates both have in-degree 2 (or fan-in 2) out-degree 1 (fan-out 1).
- NOT gates have in-degree and out-degree 1.
- There is a single output gate with out-degree 0 (note this gate can be an input node/gate, or any internal node).

Circuit Families

Since circuits are non-uniform, one circuit cannot decide an entire language $L \subseteq {0, 1}^{*}$ (unless $L$ only contains strings of one fixed length). Thus, we need to define circuit families to handle variable length inputs.

Definition. Let $T : N \to N$ be a function. A $T$ -sized circuit family is a sequence of circuits ${C_{n}}_{n \in N}$ such that $∣ C_{n} ∣ \leq T (n)$ and $C_{n}$ has $n$ input gates for all $n \in N$ . We say that a language $L$ is in the class $SIZE (T)$ if there exists an $O (T)$ -sized circuit family which decides $L$ ; that is, for all $n \in N$ and $x \in {0, 1}^{n}$ , $x \in L$ if and only if $C_{n} (x) = 1$ .

Examples.

The unary language $L = {1^{n} : n \in N}$ lies within $SIZE (n)$ ; that is, $L \in SIZE (n)$ . Moreover, any unary language $L \subseteq {1}^{*}$ is in $SIZE (n)$ .
For $L = {(x, y, z) : x, y, z \in N, z = x + y}$ , we have $L \in SIZE (n^{2})$ .

Lecture 17

In-class notes: CS 505 Spring 2025 Lecture 17

Continuing our discussion on circuits, recall that $SIZE (T)$ is the set of all languages $L$ that are decidable by an $O (T)$ -sized circuit family ${C_{n}}_{n \in N}$ , where for all $x$ , we have $x \in L$ if and only if $C_{∣ x ∣} (x) = 1$ .

We highlight two facts related to circuits.

Any (3)CNF Boolean formula is a special-case of circuits.
Recall at the beginning of class this semester, we stated that every Boolean function $f : {0, 1}^{n} \to {0, 1}$ has an $O (n \cdot 2^{n})$ sized CNF formula computing it. This readily implies that any such function $f$ is computable by an $O (n \cdot 2^{n})$ -sized circuit.
- Shannon improved this bound to $O (2^{n} / n)$ for any such $f$ .
- Others improved over Shannon, giving an upper bound of $(2^{n} / n) (1 + o (n))$ .

Circuit Complexity: P/poly

We’ll not discuss what complexity theorists feel is one of the most important circuit complexity classes: $P_{/poly}$ . Intuitively, this is the circuit equivalent of $P$ .

Definition. The class $P_{/poly}$ is the set of all polynomial-sized circuits. That is, $P_{/poly} = \cup_{c \in N} SIZE (n^{c})$ .

Theorem. $P \subseteq P_{/poly}$ .

Proof. First, recall that any time- $T$ Turing machine $M$ has an equivalent oblivious Turing machine $N$ running in time $O (T lo g (T))$ (in class, we saw a simpler proof of $O (T^{2})$ ). The properties of this machine $N$ are as follows.

For all $x$ , $M (x) = N (x)$ .
At any timestep $i$ , the position of the tape heads of $N$ are a function of $∣ x ∣$ and the current timestep $i$ (that is, they only depend on these two quantities).

Crucially, (2) above tells us for any fixed input length $n$ , the machine $N (x)$ for any $x \in {0, 1}^{n}$ moves its heads the exact same way. Now, to show the theorem, we will argue that every time- $T$ oblivious TM has an $O (T)$ -sized circuit family deciding it. The remainder of the proof will be a high-level overview.

Let $t = T (∣ x ∣)$ denote the runtime of $N (x)$ . For timestep $i \in [t]$ , we define $z_{i}$ to be a snapshot of the execution of $N$ on input $x$ . This snapshot $z_{i}$ contains

The current state of $N$ at timestep $i$
All $k$ symbols under the $k$ tape heads of $N$ .

Since for every fixed Turing machine, the number of states and tapes is constant, we have that $∣ z_{i} ∣ = O (1)$ for every $i$ . Now, we define a transcript of $N$ as $z_{1}, z_{2}, \dots, z_{t}$ , where $z_{1}$ is the initial snapshot of $N$ in its initial configuration, and snapshot $z_{i}$ follows from $z_{i - 1}$ via the transition function. That is, we can write $δ (z_{i - 1}) = z_{i}$ (to abuse notation).

The key observation here is that since each $z_{i}$ is constant-size, and moving from $z_{i}$ to $z_{i + 1}$ for every $i$ only depends on $i$ and $∣ x ∣$ , we can compute each $z_{i}$ using a constant-sized circuit. This constant-sized circuit, which we denote by $c_{i}$ , takes as input $z_{i - 1}$ and $i$ (and assumes that $n = ∣ x ∣$ ) and outputs $z_{i}$ . Then, to construct the final circuit $C_{n}$ , we simply compose all of these sub-circuits together to compute $z_{1}, z_{2}, \dots, z_{t}$ , and a final sub-circuit which reads $z_{t}$ and outputs $1$ if and only if $z_{t}$ contains the accept state.
This circuit has size $O (t) = O (T (∣ x ∣))$ , and we can define such a circuit $C_{n}$ for every input length $n$ . Finally, the circuit has worst-case size $O (T^{2})$ if we started with a non-oblivious Turing machine. $□$

Note. In the above proof, the transformation from the non-oblivious Turing machine to a circuit can be performed in polynomial-time and logarithmic space. We will use this fact later.

P is a strict subset of P/poly

The non-uniform power of $P_{/poly}$ can be showcased in a number of ways. Here, we’ll first show that the inclusion of the above theorem is strict. That is, $P ⊊ P_{/poly}$ . This follows from last lecture when we stated that any unary language is in $P_{/poly}$ .

Lemma. For any $L \subseteq {1^{n} : n \in N}$ , we have $L \in P_{/poly}$ .

Proof. For every $n$ , we have two cases. First, if $1^{n} \in / L$ , we define the circuit $C_{n}$ to be the constant-sized circuit encoding the Boolean formula $(x_{1} \land \overline{x}_{1})$ , ignoring all other input bits.¹ This formula always outputs $0$ , so it will always reject since $1^{n} \in / L$ (and no other strings of length $n$ are in $L$ too).

Now, if $1^{n} \in L$ , we define $C_{n}$ to be the $O (n)$ -size circuit encoding $(x_{1} \land x_{2} \land \dots \land x_{n})$ . This formula outputs $1$ if and only if $x = 1^{n}$ . $□$

Why does this show $P ⊊ P_{/poly}$ ? It is because we can encode Turing undecidable problems into unary languages. Consider the following unary halting problem language.

$U H A L T = {1^{n} : ⟨ n ⟩_{2} = (M, x) and M (x) halts.}$

Clearly, $U H A L T \subset {1}^{*}$ , so $U H A L T \in P_{/poly}$ , but $U H A L T$ is not Turing decidable!

Alternate Proof of Cook-Levin Theorem

We can use circuits and obtain an alternate proof of the Cook-Levin theorem. To do this, we’ll need the language of circuit satisfiablility. We let $CKT - SAT$ denote the set of all strings $C \in {0, 1}^{*}$ such that $C$ encodes a circuit with a $1$ -bit output and there exists $w \in {0, 1}^{*}$ such that $C (w) = 1$ .

Theorem. $CKT - SAT$ is NP-complete.

Proof. It is clear that $CKT - SAT \in NP$ since given the encoding $C$ and a string $w$ , one can check in $O (∣ C ∣)$ time if $C (w) = 1$ . Now, we show that $CKT - SAT$ is NP-hard. To see this, we must show that for all $L \in NP$ , we have $L \leq_{p} CKT - SAT$ . Fortunately, in our proof that $P \subseteq P_{/poly}$ , the transformation from a Turing machine $M$ running in time $T$ to a $O (T^{2})$ -sized circuit family is (1) a polynomial-time transformation; and (2) works for non-deterministic Turing machines as well. Or, if we strictly use deterministic Turing machines, this proof shows how to transform any polynomial-time verifier $M_{L}$ (i.e., $x \in L$ iff $\exists u$ such that $M_{L} (x, u) = 1$ ) into a polynomial-sized circuit family ${C_{n}}_{n \in N}$ such that $x \in L$ if and only if $\exists w$ such that $C_{∣ x ∣} (x, w) = 1$ . $□$

With the above theorem, to show an alternate proof of the Cook-Levin theorem, we must now show that $CKT - SAT \leq_{p} 3SAT$ . This is done by constructing a 3CNF formula $ϕ$ as follows. Consider any circuit $C$ and any node $v_{i}$ in $C$ with parent nodes $v_{j}$ and $v_{k}$ .

If $v_{i}$ is the AND of $v_{j}$ and $v_{k}$ , we encode the Boolean formula “ $(z_{j} \land z_{k}) = z_{i}$ ” in the formula $ϕ$ , using the fact that the Boolean operator “ $=$ ” can be written as $x = y \mapsto (x \lor \overline{y}) \land (\overline{x} \lor y)$ . To turn this into a 3CNF, we can encode “ $(z_{j} \land z_{k}) = z_{i}$ ” as 4 clauses in 3CNF form. The final expression will be $(\overline{z}_{i} \lor \overline{z}_{j} \lor z_{k}) \land (\overline{z}_{i} \lor z_{j} \lor \overline{z}_{k}) \land (\overline{z}_{i} \lor z_{j} \lor z_{k}) \land (z_{i} \lor \overline{z}_{j} \lor \overline{z}_{k}) .$
If $v_{i}$ is the OR of $v_{j}$ and $v_{k}$ , we again do the same thing as above. This will result in 3 clauses in 3CNF form; namely, $(z_{i} \lor \overline{z}_{j}) \land (z_{i} \lor \overline{z}_{k}) \land (\overline{z}_{i} \lor z_{j} \lor z_{k}),$ where we can turn the two 2CNF clauses into 3CNF by simply repeating a variable already in the clause.
If $v_{i}$ is the NOT of $v_{j}$ , then we simply encode $\overline{z}_{j} = z_{i}$ into a 3CNF, resulting in 2 clauses in 3CNF form.

Crucially, for every triple of nodes $v_{i}, v_{j}, v_{k}$ (or just $v_{i}, v_{j}$ in the case of NOT), there is a constant-sized 3CNF formula on 3 (or 2) variables to compute it. This means that $∣ p hi ∣ = O (∣ C ∣)$ , so the transformation is polynomial-time in $∣ C ∣$ . Finally, clearly we have that $C$ is satisfiable if and only if $ϕ$ is satisfiable. Thus, we have given an alternate proof of the Cook-Levin Theorem.

Corollary. If $A \in DTIME (T)$ , then $A \in SIZE (T^{2})$ .²

Uniformly Generated Circuits

So far we have seen that circuits are quite powerful and can decide Turing undecidable languages. This is because of the non-uniform computation model of circuits. In particular, to show that $L \in P_{/poly}$ , it is enough that there exists a circuit family ${C_{n}}_{n}$ which decides $L$ . This does not ever consider if this family is constructible at all! So this begs the natural question: what if we restrict our attention to circuit families which are only efficiently constructible?

Definition. A circuit family ${C_{n}}_{n \in N}$ is said to be P-uniform if there exists a polynomial-time Turing machine $M$ such that for all $n \in N$ , $M (1^{n}) = C_{n}$ . That is, $M (1^{n})$ outputs the description of the circuit $C_{n}$ in $O (n^{c})$ time for some constant $c$ .

Unfortunately, restricting our attention to such circuits only gives us P.

Theorem. A language $L$ is decidable by a P-uniform circuit family if and only if $L \in P$ .

Proof. For all $x$ , let $n = ∣ x ∣$ . Suppose that $L$ is decidable by a P-uniform circuit family ${C_{n}}$ . This means that $x \in L$ if and only if $C_{n} (x) = 1$ , where $M (1^{n}) = C_{n}$ in $poly (n)$ time. Define a new Turing machine $N$ that on input $x$ simply runs $M (1^{∣ x ∣})$ to obtain $C_{∣ x ∣}$ , then evaluates $C_{∣ x ∣} (∣ x ∣)$ and outputs the circuit outputs. Clearly, this machine $N$ runs in polynomial time since $∣ C_{n} ∣ = poly (n)$ and $M$ runs in $poly (n)$ time.

The other direction follows directly from the above corollary: $L \in DTIME (T)$ implies $L \in SIZE (T^{2})$ . In particular, for any $L \in P$ with Turing machine $M$ , we can construct a new DTM $N$ which on input $x$ outputs the circuit for the oblivious Turing machine which decides $L$ .

Logspace Uniform Circuits

We can further restrict P-uniform circuits to require that they be implicitly logspace computable. That is, $M (1^{n}) = C_{n}$ using $O (lo g (n))$ space on the non-output tape. More formally, $(⟨ M (1^{n}) ⟩_{2})_{i} = (C_{n})_{i}$ in $O (lo g (n))$ space (i.e., we can compute the $i$ -th bit of the representation of $C_{n}$ in logarithmic space).

Theorem. $L$ is decidable by a logspace uniform circuit family if and only if $L \in P$ .

This again follow from our proof that $P \subseteq P_{/poly}$ since the transformation in that proof is implicitly logspace computable, and the fact that logspace computable circuits are already P-uniform circuits.

P/poly vs Other Classes

Circuits are, in principle, one way to overcome the barriers of diagonalization and relativization which do not allow us to separate P and NP. We’ll now examine how $P_{/poly}$ relates to other complexity classes. For now, we’ll only state the results and prove (some of) them in the next lecture.

P/poly vs. NP

Theorem. If $NP \subseteq P_{/poly}$ , then $PH = Σ_{2}^{p}$ .

P/poly vs. EXP

Theorem. If $EXP \subseteq P_{/poly}$ , then $EXP = Σ_{2}^{p}$ .

P/poly vs. BPP

Theorem. $BPP \subseteq P_{/poly}$ .

Technically speaking, we should be defining the circuit as $(x_{1} \land \overline{x}_{1} \land x_{2} \dots \land x_{n})$ since $C_{n}$ takes $n$ inputs. Clearly, this circuit has size $O (n)$ . ↩
More precisely, $SIZE (T lo g (T))$ . ↩

Lecture 18

In-class notes: CS 505 Spring 2025 Lecture 18

Proofs from Last Lecture

Today, we’ll begin by proving two theorems from last lecture.

NP vs. P/poly

Theorem. If $NP \subseteq P_{/poly}$ , then $PH = Σ_{2}^{p}$ .

Proof. We show that if $NP \subseteq P_{/poly}$ , then $Π_{2}^{p} \subseteq Σ_{2}^{p}$ . It suffices to show that the $Π_{2}^{p}$ -complete problem $Π_{2} SAT$ is in the class $Σ_{2}^{p}$ . Recall the definition of $Π_{2} SAT$ . $Π_{2} SAT = {φ : \forall u \in {0, 1}^{n} \exists v \in {0, 1}^{n} φ (u, v) = 1}$ For convenience, we will assume Boolean formulas have $2 n$ variables.

Under the assumption that $NP \subseteq P_{/poly}$ , we know that $SAT \in P_{/poly}$ . Now, this tells us that there exists a circuit family ${C_{n}}_{n \in N}$ such that for any Boolean formula $ϕ$ , if $∣ ϕ ∣ = m$ then $ϕ \in SAT$ if and only if $C_{m} (ϕ) = 1$ ; moreover, $∣ C_{m} ∣ = poly (m)$ .

Let $φ$ be a formula on $2 n$ variables. Then, for any $u \in {0, 1}^{n}$ , we have that $ϕ = φ (u, \cdot)$ is a Boolean formula on $n$ variables. Here, we can see that $ϕ \in SAT$ if and only if there exists $v \in {0, 1}^{n}$ such that $ϕ (v) = 1$ , if and only if $φ (u, v) = 1$ .

By our assumption, $ϕ \in SAT$ if and only if $C_{m} (ϕ) = 1$ for $∣ ϕ ∣ = m$ , if and only if $C_{m} (φ, u) = 1$ for any $u \in {0, 1}^{n}$ . We will use this fact to construct a new circuit family ${C_{n}^{'}}_{n \in N}$ to take as input $φ$ and $u$ and output $v$ such that $φ (u, v) = 1$ . In particular, for $m = ∣ φ ∣$ and $u \in {0, 1}^{n}$ , $C_{m}^{'} (φ, u) = v$ if and only if there exists $v$ such that $φ (u, v) = 1$ . The circuit $C_{m}^{'}$ will use multiple copies of the circuit $C_{m}$ as a sub-circuit. Intuitively, $C_{m}^{'} (φ, u)$ will construct the formula $ϕ = φ (u, \cdot)$ . Let $ϕ$ have variables $x_{1}, \dots, x_{n}$ . Now, $C_{m}^{'}$ will iteratively set $x_{i} = 0$ and $x_{i} = 1$ , use $C_{m}$ to test if $ϕ (x_{i} = 0)$ or $ϕ (x_{i} = 1)$ and iteratively build the satisfying assignment $v$ for $ϕ$ . Clearly, since $∣ C_{m} ∣$ is polynomial, then $∣ C_{m}^{'} ∣$ is also $poly (m)$ .

Thus, for any $φ$ , we have that $φ \in Π_{2} SAT$ if and only if $\forall u \in {0, 1}^{n} \exists v \in {0, 1}^{n}$ such that $φ (u, v) = 1$ , which we have shown happens if and only if $\forall u \in {0, 1}^{n} \exists v \in {0, 1}^{n}$ such that $C_{m}^{'} (φ, u) = v$ for $m = ∣ φ ∣$ . Now, the definition of $P_{/poly}$ only tells us that the family ${C_{n}^{'}}_{n \in N}$ , but not how to construct it! Here, we’ll rely on the fact that since $∣ C_{n}^{'} ∣ = q (n)$ for some polynomial $n$ , then we only need $O (q^{2} (n))$ bits to write the description of $C_{n}^{'}$ .¹

Therefore, we can now flip the quantifiers in the above formula as follows. We have that $φ \in Π_{2} SAT$ if and only if $\exists w \in {0, 1}^{q^{2} (m)} \forall u \in {0, 1}^{n} (w = C_{m}^{'} \land φ (u, C_{m}^{'} (φ, u)) = 1) .$ Here, $w$ is the bit-string which encodes the circuit $C_{m}^{'}$ , and $m = ∣ φ ∣$ . Clearly, the above formula encodes $Σ_{2} SAT$ . Therefore, we have shown that $φ \in Π_{2} SAT$ if and only if $φ \in Σ_{2} SAT$ , so $Π_{2}^{p} \subseteq Σ_{2}^{p}$ , and thus the polynomial hierarchy collapses. $□$

BPP vs. P/poly

Theorem. $BPP \subseteq P_{/poly}$ .

Proof. Let $L \in BPP$ . Then, there exists a polynomial-time deterministic Turing machine $M$ such that for all $n \in N$ and $x \in {0, 1}^{n}$ , we have $Pr_{r} [M (x, r) = L (x) \geq 1 - 1/ 2^{n + 1}$ . Here, $r \in {0, 1}^{m}$ for some $m = poly (n)$ . Equivalently, $Pr_{r} [M (x, r) \neq = L (x)] \leq 1/ 2^{n + 1}$ .

For any $x \in {0, 1}^{n}$ , we say that the string $r \in {0, 1}^{m}$ is bad for $x$ if $M (x, r) \neq = L (x)$ . Otherwise, we say that $r$ is good for $x$ . For any $x$ , define $S_{x} = {r : r is bad for x} \subseteq {0, 1}^{m}$ .

By definition, for any $x$ , we have $∣ S_{x} ∣ \leq 2^{m} / 2^{n + 1}$ since $Pr_{r} [M (x, r) \neq = L (x)] \leq 1/ 2^{n + 1}$ . Now, if we take the union of all $S_{x}$ , we can upper bound its size via the Union Bound: $x \in {0, 1}^{n} ⋃ S_{x} \leq x \in {0, 1}^{n} \sum ∣ S_{x} ∣ \leq 2^{n} \cdot \frac{2 ^{m}}{2 ^{n + 1}} = \frac{2 ^{m}}{2} .$ Note that the set $T = \cup_{x} S_{x}$ is the set of all $r$ such that $r$ is bad for every $x$ . So we have shown that $∣ T ∣ \leq 2^{m} /2 < 2^{m}$ . In particular, this implies that there must exist at least one string $r^{*}$ that is good for every $x$ . That is, $\exists r^{*} \in {0, 1}^{m} \forall x \in {0, 1}^{n} M (x, r^{*}) = L (x)$ .

Now, since $M$ runs in polynomial-time, say $p (n)$ for inputs of length $n$ , we can implement $M$ via a circuit of size at most $O (p (n)^{2} + m)$ size for each input of length $n$ . Let $C_{n}$ be this circuit. In particular, $C_{n}$ will have the string $r^{*}$ hard-coded, and will compute $M (x, r^{*})$ . We complete this process for every $n$ , hard-coding each appropriate string $r^{*}$ . This implies that $L \in P_{/poly}$ . $□$

Circuit Lower Bounds

In principle, circuit lower bounds allow us to circumvent the diagonalization barriers to showing P vs NP. In particular, if $NP \neq \subseteq P_{/poly}$ , then $P \neq = NP$ since $P \subseteq P_{/poly}$ . Thus, if we could show that a single language $L \in NP$ must have a super-polynomial sized circuit family deciding it, we have shown the result. However, the best lower bound known for any $L \in NP$ states that $L$ is decidable by a circuit family of size at least $(5 - o (1)) n$ . This lower bound is quite far from the unconditional lower bound due to Shannon.

Theorem. For all $n \in Z^{+}$ , there exists $f : {0, 1}^{n} \to {0, 1}$ such that $f$ is not computable by any circuit $C$ of size $∣ C ∣ \leq 2^{n} / (10 n)$ .

Proof. First, the number of function $f : {0, 1}^{n} \to {0, 1}$ is $2^{2^{n}}$ . Next, for any circuit $C$ , if $∣ C ∣ = S$ , then we only need at most $9 \cdot S lo g (S)$ bits to represent $C$ as a binary string. This implies the number of circuits of size $S$ is at most $2^{9 S l o g (S)} < 2^{S^{2}}$ . Setting $S = 2^{n} / (10 n)$ , this implies that the number of circuits of size $S$ is at most $2^{(2^{n} / (10 n))^{2}} < 2^{2^{n}}$ since $(2^{n} / (10 n))^{2} < 2^{n}$ for large enough $n .$ $□$

Note that the above bound on the circuit size was improved in at least two subsequent works.

$(1 - ε) 2^{n} / n$ for all $ε > 0$ .
$2^{n} (1 + lo g (n) / n - O (1/ n))$ .

Non-Uniform Time Hierarchy

Just like with Turing machines, circuits have their own “time hierarchy” theorem.

Theorem. Let $T, T^{'} : N \to N$ be two functions such that $n < 10 \cdot T (n) < T^{'} (n) < 2^{n} / n$ . Then, $SIZE (T) ⊊ SIZE (T^{'}) .$

Proof. Let $f : {0, 1}^{ℓ} \to {0, 1}$ be a function that is not computable by any circuit $C$ of size at most $∣ C ∣ \leq 2^{ℓ} / (10 ℓ)$ . Now, recall that any function $g : {0, 1}^{n} \to {0, 1}$ can be implemented by a circuit $C^{'}$ of size $n \cdot 2^{n}$ .

Set $ℓ = (1.1) lo g (n)$ and define the function $g : {0, 1}^{n} \to {0, 1}$ as follows: $g (x) = f (x_{1}, \dots, x_{ℓ})$ . That is, $g$ simply throws away the last $n - ℓ$ bits of its input $x$ and just computes $f (x_{1}, \dots, x_{ℓ})$ .

We know that $g \in SIZE (2^{ℓ} \cdot 10 ℓ) \subseteq SIZE (n^{2})$ , where the last subset inclusion is due to the fact that $ℓ = O (lo g (n))$ . However, we know by definition that $g \in / SIZE (2^{ℓ} / (10 ℓ) \subseteq SIZE (n)$ , where the last subset inclusion is again due to $ℓ = O (lo g (n))$ .

Parallel Complexity

Complexity theory also attempts to define what it means for a computation to have an efficient parallel program implementation. They do this via circuits and the class $NC$ .

Definition. For all $i \in N$ , the class $NC^{i}$ is the set of all languages $L$ decidable by a circuit family ${C_{n}}_{n \in N}$ such that

$∣ C_{n} ∣ \leq poly (n)$ , and
$depth (C_{n}) \leq O (lo g^{i} (n))$ , where $depth (C_{n})$ is the length of the longest path from an input node to (any) output node. Moreover, Nick’s Class is defined as $NC = \cup_{i \in N} NC^{i}$ .

Note that the class $NC^{0}$ is a special class since it is all constant-depth circuits. By our definition of circuits, we only have circuits with bounded fan-in $2$ . Therefore, all circuits in $NC^{0}$ have outputs which can only depend on a constant number of the input bits. In particular, this implies that $NC^{0}$ is a (logspace) uniform circuit class! Otherwise, we can define uniform NC as NC restricted to logspace uniform circuit families.

We can also remove the restriction of bounded fan-in and obtain the class AC.

Definition. For all $i \in N$ , the class $AC^{i}$ is identical to the class $NC^{i}$ , except nodes in the circuits have unbounded (polynomial) fan-in. Then, $AC = \cup_{i} AC^{i}$ .

Theorem. For all $i$ ,

$NC^{i} \subseteq AC^{i} \subseteq NC^{i + 1}$ .
$NC^{0} ⊊ AC^{0}$ .

Note that it is unknown if the first statement in the above theorem is strict.

Theorem. A language $L$ has an efficient parallel algorithm² if and only if $L \in NC$ .

Parallel Complexity Major Open Questions

The major question with parallel complexity is $P = NC$ . Complexity theorists generally believe that $P \neq = NC$ , but are currently unable to even separate $NC^{1}$ form $PH$ . The study of this question motivates P-completeness. A language $L$ is P-complete if $L \in P$ and $L^{'} \leq_{L} L$ for all $L^{'} \in P$ (i.e., closed under logspace reductions).

Theorem. If $L$ is a P-complete language, then

$L \in NC$ if and only if $P = NC$ .
$L \in L$ if and only if $P = L$ (recall that $L = DSPACE (lo g (n))$ ).

The following is a natural P-complete language which could possibly resolve these open questions. $CIR - EVAL = {(C, x) : C (x) = 1 and C is a Boolean circuit.}$

Polynomial Hierarchy via Exponential-size Circuits

We complete our study of Boolean circuits by giving yet another characterization of the polynomial hierarchy.

Definition. A circuit family ${C_{n}}_{n}$ is DC-uniform if there exists a polynomial-time Turing machine $M$ such that $M (n, i)$ outputs the $i$ -th bit of the binary description of $C_{n}$ .

Note that a DC-uniform circuit family can have exponential-size circuits $C_{n}$ .

Theorem. A language $L \in PH$ if and only if $L$ is decidable by a DC-uniform circuit family ${C_{n}}_{n}$ with the following properties for all $n$ .

$C_{n}$ only has AND, OR, and NOT gates.
$∣ C_{n} ∣ \leq 2^{n^{O (1)}}$ and $depth (C_{n}) = O (1)$ .
$C_{n}$ has unbounded (exponential) fan-in.
Every NOT gate in $C_{n}$ is only at the input level.

Note that if we allow the circuits in the above theorem to have larger than constant depth, then we have characterized $EXP$ .

To see this, consider the adjacency matrix view of the circuit $C_{n}^{'}$ . ↩
Aurora and Barak do not define what this term means. ↩

Lecture 19

In-class notes: CS 505 Spring 2025 Lecture 19

Interactive Proofs

Recall the definition of NP with respect to deterministic Turing machines (i.e., verifiers). $NP = {L ∣ \exists poly-time DTM M_{L} s.t. x \in L ⟺ \exists w M_{L} (x, w) = 1}$ In some sense, the string $w$ is a “proof” that $x \in L$ .

Interactive Proofs (IPs) were an attempt to re-define NP languages using interaction. Every IP is described by two interactive algorithms/Turing machines $(P, V)$ . For a given language $L \in NP$ (or other classes, as we will see), an IP seeks to answer/certify whether $x \in L$ .

Interactive Proof

In an IP, the prover algorithm is assumed to be computationally unbounded; that is, it can perform any computation, even decide undecidable problems. However, we restrict the verifier to be strictly polynomial-time in the length of the input to the protocol $x$ . We restrict ourselves to the protocol format described in the above picture, where the protocol has $k$ rounds and has $2 k + 1$ messages, where $P$ always sends the first and last message. Note that this model can capture when the verifier sends the first and last messages (the prover’s first and last messages are simply empty).

Every message of the prover is a function of the input $x$ and all prior messages sent in the protocol; i.e., $m_{i} = P (x, (m_{j}, c_{j})_{j = 1}^{i - 1})$ . Similarly, the verifier messages are a function of the input $x$ and all prior messages received in the protocol; i.e., $c_{i} = V (x, (m_{j}, c_{j})_{j = 1}^{i - 1}, m_{i})$ . The output of the protocol is denoted by $⟨ P (y), V (z)⟩ (x)$ , where $y$ is a private input that $P$ may receive, and $z$ is a private input that $V$ may receive, and $x$ is the common/public input that both parties receive. Note that in the case that $P$ and $V$ receive a private input, then the messages each party sends are also a function of these private inputs. The output is defined as $b = V (x, (m_{j}, c_{j})_{j = 1}^{k - 1}, m_{k})$ , and is a bit $b$ indicating whether the verifier accepts the interaction (i.e., if $V$ should believe that $x \in L$ is a true statement).

Note that because we are restricting the verifier to be strictly polynomial time, this naturally restricts the size of all $m_{i}, c_{i}$ to be at most $poly (∣ x ∣)$ bits.

Some natural questions come to mind with the above model.

Should $P$ and $V$ be allowed to use randomness? We will address this question soon.
What is an accepting or rejecting proof? We will also address this question soon.
Who sends the first/last message? We discussed this briefly above; it doesn’t matter too much.
What is the size of the proof? This is simply the total number of bits exchanged between both parties; by the above discussion, all proofs are $poly (∣ x ∣)$ bits.

Deterministic IPs

In this model, we restrict both the prover and the verifier to be deterministic.

Definition. We say that a language $L$ is in the class $dIP [k]$ if there exists a $k$ -round ( $2 k + 1$ message) IP $(P, V)$ such that for any $x$ , $⟨ P, V ⟩ (x)$ has the following properties:

(Completeness) If $x \in L$ , then $⟨ P, V ⟩ (x) = 1$ .
(Soundness) If $x \in / L$ , then for any algorithm $P^{*}$ , $⟨ P^{*}, V ⟩ (x) = 0$ .
(Rounds) The IP has $k (∣ x ∣)$ rounds.

The class $dIP$ is defined as $dIP = \cup_{c \geq 0} dIP [n^{c}]$ .

Unfortunately, deterministic IPs are no more powerful than NP.

Theorem. $dIP = NP$ .

Proof.

$NP \subseteq dIP$ . If $L \in NP$ , then there exists a DTM $M_{L}$ running in strict polynomial time such that $x \in L$ if and only if $\exists w$ such that $M_{L} (x, w) = 1$ , which implies that $x \in / L$ if and only if $\forall w$ , $M_{L} (x, w) = 0$ . In both cases, $∣ w ∣ = poly (∣ x ∣)$ . Then, a simple dIP for $L$ is the following: (a) the prover $P$ , on input $x$ , simply computes a witness $w$ such that $M_{L} (x, w) = 1$ and sends $w$ to the verifier $V$ ; (b) $V$ outputs $M_{L} (x, w) = 1$ . It is not hard to see that this dIP $(P, V)$ satisfies the above definition.
$dIP \subseteq NP$ . Let $L \in dIP$ . Then, there exists a $k$ -round dIP $(P, V)$ such that
- $x \in L ⟹ ⟨ P, V ⟩ (x) = 1$ ;
- $x \in / L ⟹ \forall P^{*}, ⟨ P^{*}, V ⟩ (x) = 0$ ; and
- $k (∣ x ∣) = poly (∣ x ∣)$ .
We construct a NP verifier $M_{L}$ to decide $L$ . $M_{L}$ will simply be the final computation performed by the verifier $V$ . Recall that the output of the protocol is defined as $b = V (x, (m_{i}, c_{i})_{i = 1}^{k - 1}, m_{k})$ . We set the witness $w = ((m_{i}, c_{i})_{i = 1}^{k - 1}, m_{k})$ . Since $k, m_{i}, c_{i}$ are all polynomial in $∣ x ∣$ for all $i$ , we have that $∣ w ∣ = poly (∣ x ∣)$ . If $x \in L$ , then there is a valid witness such that $V$ accepts, so $M_{L}$ will accept. If $x \in / L$ , then there is no prover strategy $P^{*}$ that causes the verifier to accept; in other words, all possible transcripts are rejected by the verifier, so $M_{L}$ always rejects. $□$

The Class IP: Random Verifiers

The above discussion tells us that dIP is no more powerful than NP. It turns out the reason is a lack of randomness in the protocol. Whereas researchers do not believe that BPP is more powerful than P, the class IP, which we define next, will be much more powerful than NP.¹

Definition. The class $IP [k]$ is the set of all languages that have a $k$ -round IP $(P, V)$ such that: $P$ is deterministic and computationally unbounded; $V$ is a probabilistic polynomial time (PPT) algorithm/machine (that is, $V$ runs in strict polynomial time and is allowed to sample random coins); and the following hold:

(Completeness) If $x \in L$ then $Pr_{V} [⟨ P, V ⟩ (x) = 1] \geq 2/3$ ; and
(Soundness) If $x \in / L$ , then for all $P^{*}$ , $Pr_{V} [⟨ P^{*}, V ⟩ (x) = 1] \leq 1/3$ .

The class $IP$ is defined as $IP = \cup_{n \geq 0} IP [n^{c}]$ .

Some notes about the above definition.

The verifier $V$ in the above definition is not required to reveal the random coins, and can sample coins as a function if its input and the messages it has seen so far. An IP where the verifier does not reveal its randomness to the prover is called a private coin protocol. If all the verifier messages are uniformly random strings, then it is a public coin protocol. We will see later that private coins are not necessary!
Does $P$ being probabilistic change the class IP? No! $P$ being computationally unbounded, for any probabilistic strategy, it can optimally compute the messages of this prover strategy to maximize the success probability of the verifier. So they are equivalent. Moreover, this can be done only using $poly (∣ x ∣)$ space. Intuitively, this implies that $IP \subseteq PSPACE$ .
As we will see in the below lemma, we can amplify both the completeness and soundness to be arbitrarily close to $1$ , which does not change the class $IP$ . Moreover, the lemma below can be performed in parallel, so the resulting amplified protocol still has $k$ rounds.
In fact, setting the completeness probability equal to $1$ (so-called *perfect completeness) does not change the class $IP$ . This is a non-trivial fact!
On the contrary, setting the soundness error equal to $0$ collapses $IP$ to $dIP$ .

Lemma. The class $IP$ remains the same if we require completeness to hold with at least $1 - 2^{- n^{s}}$ probability and soundness to hold with at most $2^{- n^{s}}$ probability for fixed constant $s > 0$ .

Proof Sketch. Given $(P, V)$ with completeness/soundness error $1/3$ , we can construct a new IP $(P^{'}, V^{'})$ which runs $(P, V)$ sequentially for some number $m$ times. The output of $V^{'}$ is the majority of the $m$ answers obtained from $V$ . Setting $m = O (n^{s})$ gives us the desired result via a Chernoff bound.

IP for Graph Non-Isomorphism

To demonstrate the power of the class IP, we will give an IP for the graph non-isomorphism problem, which is in coNP. Note that we do not know if languages in coNP have short certificates, and yet we can give an IP with a polynomial sized proof and polynomial-time verification.

First, let $G_{0}, G_{1}$ both be undirected simple graphs on $n$ vertices. We say that $G_{0}$ is isomorphic to $G_{1}$ , which is denoted by $G_{0} ≅ G_{1}$ , if you can relabel the vertices of $G_{0}$ and obtain the graph $G_{1}$ . In other words, there exists a permutation $π$ on the set $[n]$ such that $π (G_{0}) = G_{1}$ (i.e., you relabel the vertices of $G_{0}$ according to the permutation $π$ and obtain the graph $G_{1}$ ).

Graph Isomorphism

The language of graph isomorphism, denoted as $GI = {(G_{0}, G_{1}) : G_{0} ≅ G_{1}}$ , is known to be in NP; it is unknown if $GI$ is NP-complete (and we have evidence that it is not; we’ll see this in a later lecture). Then, $GNI = \overline{GI} = {(G_{0}, G_{1}) : G_{0} \neq ≅ G_{1}}$ is the graph non-isomorphism problem, and is in coNP.

GNI Interactive Proof

Below, we sketch the interactive proof for $GNI$ . Let $(G_{0}, G_{1})$ be two $n$ vertex graphs. Our proof system $(P, V)$ will operate as follows on input $(G_{0}, G_{1})$ .

The verifier $V$ samples $b \leftarrow $ {0, 1}$ uniformly at random and samples a uniformly random permutation $σ : [n] \to [n]$ . $V$ defines $H = σ (G_{b})$ and sends $H$ to $P$ .
The prover $P$ receives $H$ and computes a bit $b^{'}$ . $P$ sends $b^{'}$ to $V$ .
$V$ outputs $1$ if and only if $b = b^{'}$ .

Analysis

If $G_{0} \neq ≅ G_{1}$ , then no $σ$ exists such that $σ (G_{0}) = σ (G_{1})$ . This means that $P$ , being computationally unbounded, can trivially compute $b^{'} = b$ from $H$ ; i.e., identify which graph $H$ is isomorphic to.
If $G_{0} ≅ G_{1}$ , then $P$ can only guess the correct bit $b^{'}$ and succeed with probability at most $1/2$ .

GNI with Public Coins?

The above IP for $GNI$ crucially relies on $b, σ$ being hidden. It is not hard to see why: if $P$ knows $b$ , $P$ can always respond correctly; the same thing happens if $P$ knows $σ$ . Can we give a $GNI$ IP that uses public coins? To do so, we’ll need to take a more quantitative (but equivalent) approach to $GNI$ .

Let $G_{0}, G_{1}$ be two $n$ vertex graphs. Define a new set $S$ as $S = {H ∣ H ≅ G_{0} or H ≅ G_{1}} .$

Notice that it is easy to certify if $H \in S$ for any $H$ : simply provide a permutation $π$ such that $π (H) = G_{0}$ or $π (H) = G_{1}$ . Now, any graph $G$ on $n$ vertices can have at most $n!$ equivalent graphs, where a graph $G^{'}$ is equivalent to $G$ if $G^{'} \neq = G$ and $G^{'} ≅ G$ . If we pretend that $G_{0}, G_{1}$ both have exactly $n!$ equivalent graphs, we know that $S$ is the set of all graphs $H$ which are equivalent to either $G_{0}$ or $G_{1}$ . This leads us to the following observations.

If $G_{0} ≅ G_{1}$ , then $∣ S ∣ = n!$ .
If $G_{0} \neq ≅ G_{1}$ , then $∣ S ∣ = 2 n!$ .

These observations will be a stepping stone to obtaining a public-coin protocol for $GNI$ .

Under standard and widely believe complexity theory conjectures. ↩

Lecture 20

In-class notes: CS 505 Spring 2025 Lecture 20

Set Lower Bound IP

Building off of our discussion from last lecture, what we want is a set lower bound protocol. Let $S$ be some set that is known to both the prover $P$ and verifier $V$ , in the following sense.

$P$ knows $S$ explicitly.
$V$ can certify membership in $S$ with a certificate (e.g., like an NP language).

The set lower bound protocol, due to Goldwasser and Sipser, will be a public coin protocol that proves $∣ S ∣ \geq k$ for some $k$ . The protocol will have the following guarantees:

If $∣ S ∣ \geq k$ , then $V$ accepts with probability at least $2/3$ ;
If $∣ S ∣ \leq k /2$ , then $V$ rejects with probability at least $2/3$ for any prover strategy $P^{*}$ .

Tool: Pairwise-Independent Hash Functions

Before we can describe the protocol, we need a technical tool. The protocol will require something called a pairwise-independent hash function family (also known as $2$ -wise independent or $2$ -universal).

Definition. Let $H_{n, k}$ be a family of functions $h : {0, 1}^{n} \to {0, 1}^{k}$ . We say that $H_{n, k}$ is pairwise-indepedent if for all $x \neq = x^{'} \in {0, 1}^{n}$ and for all $y, y^{'} \in {0, 1}^{k}$ , it holds that $h \leftarrow $ H_{n, k} Pr [h (x) = y \land h (x^{'}) = y^{'}] = \frac{1}{2 ^{2 k}} .$

Example. The following hash function family is pairwise independent. Let $F = GF (2^{n})$ be a finite field of size $2^{n}$ .¹ Define a hash function family $H$ as follows. $H_{n, n} = {h (X) = a \cdot X + b ∣ a, b \in F} .$ In fact, we can define it as $H_{n, k}$ for any $k \leq n$ , where it is identical to $H_{n, n}$ except each function $h$ truncates the output to $k$ bits.

The Set LB Protocol

We now describe the protocol.

Setup.

Let $S \subset {0, 1}^{m}$ for some $m$ be a set with efficient membership certification (i.e., an NP language).
Let $k \in Z^{+}$ be a parameter and let $ℓ \in Z^{+}$ such that $2^{ℓ - 2} < k \leq 2^{ℓ - 1}$ .
Let $H_{m, ℓ}$ be a pairwise-independent hash function family.

Goal. If $∣ S ∣ \geq k$ , then $V$ accepts with probability $\geq 2/3$ , and if $∣ S ∣ \leq k /2$ , then $V$ rejects with probability $\geq 2/3$ .

Protocol.

$V$ samples $h \leftarrow $ H_{m, ℓ}$ and $y \leftarrow $ {0, 1}^{ℓ}$ . $V$ sends $(h, y)$ to $P$ .
$P$ computes a certificate $w$ for the statement “ $x \in S$ ”, and finds $x \in S$ such that $h (x) = y$ . $P$ sends $(x, w)$ to $V$ .
$V$ outputs $1$ if and only if $h (x) = y$ and $w$ is a valid certificate for $x \in S$ .

Completeness and Soundness. Showing completeness and soundness of this protocol will rely on the following claim.

Claim. If $∣ S ∣ \leq 2^{ℓ} /2$ , then for $p = ∣ S ∣/ 2^{ℓ}$ , we have $\frac{3}{4} p \leq h, y Pr [\exists x \in S : h (x) = y] \leq p,$ where $h \in H_{m, ℓ}$ and $y \in {0, 1}^{ℓ}$ , and the probability is taken to be uniform.

Proof. First, we show the upper bound. Notice that for any function $h$ , we have that $∣ h (S) ∣ \leq ∣ S ∣$ ; that is, $∣ {y : y = h (s), s \in S} ∣ \leq ∣ S ∣$ . In other words, it doesn’t matter which $h$ we sample from $H_{m, ℓ}$ . This tells us $h, y Pr [\exists x \in S : h (x) = y] \leq x \in S \sum h, y Pr [h (x) = y] \leq \frac{∣ S ∣}{2 ^{ℓ}} = p .$

Now, we show the upper bound. For $x \in S$ , let $E_{x}$ be the event “ $h (x) = y$ ” Then, we can rewrite the probability as $h, y Pr [\exists x \in S : E_{x}] = h, y Pr [\cup_{x \in S} E_{x}] .$ By the inclusion-exclusion principle, we can lower bound this as $h, y Pr [\cup_{x \in S} E_{x}] \geq x \in S \sum h, y Pr [E_{x}] - \frac{1}{2} x \neq = x^{'} \in S \sum h, y Pr [E_{x} \cap E_{x^{'}}] .$ Recall that $E_{x}$ is the event “ $h (x) = y$ .” Therefore, $E_{x} \cap E_{x^{'}}$ is the event “ $h (x) = y \land h (x^{'}) = y$ ” for $x \neq = x^{'}$ . By definition, $h$ is drawn from a family of pairwise independent hash functions, so we have $h, y Pr [E_{x}] = 2^{- ℓ}; h, y Pr [E_{x} \cap E_{x^{'}} ∣ x \neq = x^{'}] = 2^{- 2 ℓ} .$

Therefore, we have $h, y Pr [\cup_{x \in S} E_{x}] \geq x \in S \sum h, y Pr [E_{x}] - \frac{1}{2} x \neq = x^{'} \in S \sum h, y Pr [E_{x} \cap E_{x^{'}}] \geq \frac{∣ S ∣}{2 ^{ℓ}} - \frac{1}{2} \cdot \frac{∣ S ∣ ^{2}}{2 ^{- 2 ℓ}} = \frac{∣ S ∣}{2 ^{ℓ}} (1 - \frac{∣ S ∣}{2 ^{ℓ + 1}}) \geq \frac{3}{4} \cdot \frac{∣ S ∣}{2 ^{ℓ}} = \frac{3}{4} \cdot p .$ This establishes the lower bound. $□$

So, what does this tell us? Well, as in the protocol, let $ℓ$ be an integer such that $2^{ℓ - 2} < k \leq 2^{ℓ - 1}$ .

If $∣ S ∣ = k \leq 2^{ℓ - 1} = 2^{ℓ} /2$ , then we know that $h, y Pr [\exists x \in S : h (x) = y] \geq \frac{3}{4} \cdot \frac{∣ S ∣}{2 ^{ℓ}} = \frac{3}{4} \cdot \frac{k}{2 ^{ℓ}} > \frac{3}{4} \cdot \frac{2 ^{ℓ - 2}}{2 ^{ℓ}} = \frac{3}{16}$ Note that this case corresponds to the honest prover case, and we can boost $3/16$ to greater than $2/3$ probability using a constant number of parallel repetitions.
If $∣ S ∣ \leq k /2$ , then $h, y Pr [\exists x \in S : h (x) = y] \leq \frac{∣ S ∣}{2 ^{ℓ}} \leq \frac{k}{2 ^{ℓ + 1}} \leq \frac{2 ^{ℓ - 1}}{2 ^{ℓ + 1}} = \frac{1}{4} < \frac{1}{3} .$ Note that this case corresponds to any dishonest prover.

Final Public-coin GNI Protocol

With the set lower bound protocol, we can now give a public-coin protocol for GNI. To do so, we first modify the definition of the set $S$ from before to the following set. $S = {(H, π) : (H ≅ G_{0} \lor H ≅ G_{1}) \land π \in aut (H)}$ Here, $π \in aut (H)$ is an automorphism of $H$ ; that is, $π$ is a permutation such that $π (H) = H$ and $π$ is not the identity permutation. We need this set of pairs $(H, π)$ explicitly to handle the case when $G_{0}$ or $G_{1}$ have less than $n!$ equivalent graphs. Notice also that membership in $S$ is again easy (i.e., polynomial-time) verifiable given some certificate $w$ (i.e., $π$ and some other permutation $σ$ for isomorphism between $H$ and one of the graphs).

Under our new definition of $H$ , we have that $∣ S ∣ = n!$ if $G_{0} ≅ G_{1}$ and $∣ S ∣ = 2 n!$ if $G_{0} \neq ≅ G_{1}$ . Notice there is a factor of two difference between these two cases, so we will be able to use the set lower bound protocol to prove that $G_{0} \neq ≅ G_{1}$ .

Setup.

Set $k = 2 n!$ and choose $ℓ$ such that $2^{ℓ - 2} < k \leq 2^{ℓ - 1}$ .
Choose some pairwise independent hash function family $H_{m, ℓ}$ , where $m$ is the maximum number of bits needed to encode any element $(H, π) \in S$ .

The Protocol.

The verifier $V$ samples $h \leftarrow H_{m, ℓ}$ and $y \leftarrow {0, 1}^{m}$ uniformly at random. $V$ sends $(h, m)$ to the prover $P$ .
$P$ finds a pair $(H, π) \in S$ such that $h (H, π) = y$ , and computes $σ$ such that $σ (H) = G_{0}$ or $σ (H) = G_{1}$ . $P$ sends $(H, π, σ)$ to $V$ .
$V$ checks (a) $h (H, π) = y$ ; (b) $π (H) = H$ , (c) $π \neq = identity$ ; and (d) $σ (H) = G_{0}$ or $σ (H) = G_{1}$ .

Note this is just the set lower bound protocol! In particular, we have shown:

Theorem. $GNI \in AM [2]$ .

We’ll now define what the class $AM$ is.

Arthur-Merlin Protocols

Arthur-Merlin protocols (named after the fabled King of England and his court Wizard) are simply public-coin interactive proofs, like we saw above.

Definition. For any $k \geq 2$ , $AM [k] \subseteq IP [⌈ k /2 ⌉]$ with the following properties:

$V$ sends the first message and $P$ and $V$ exchange exactly $k$ messages;
All of $V$ ’s messages are uniformly and independently random bits; and
$V$ ’s output only depends on these random coins, $P$ ’s messages, and the common public input (i.e., $V$ has no hidden state to make its decision).

The following is commonly used notation for $AM$ protocols.

$AM [2] = AM$
$AM [3] = AMA$
$AM [4] = AMAM$
$\dots$
$AM [2 ℓ] = AMAM \dots AM$
$AM [2 ℓ + 1] = AMAM \dots AMA$

Note that there is also the class $MA [k]$ , which is identical to $AM [k]$ except the prover $P$ speaks first.

Properties of AM Protocols

$P$ does not see all the randomness sampled by $V$ all at once; it gets it in a round-by-round fashion.
You are asked to show on your homework that $AM [2] = BP \cdot NP$ . Notably, it also holds that $AM [2] \subseteq Σ_{3}^{p}$ .
For all $k = Θ (1) \geq 2$ , it holds that $AM [k] = AM [2]$ . This is surprising since $AM [k]$ has a $Π_{k}^{p}$ -like structure.
For any slowly growing function $σ (n)$ (e.g., $σ (n) = lo g lo g (n)$ ), it is unknown if $AM [σ (n)]$ has any “nice” characterization.

Equivalence of Public- and Private-coin Protocols

By definition, $AM [k] \subseteq IP [⌈ k /2 ⌉]$ . And on an intuitive level, it feels that private-coin protocols should have more power than public-coin ones. However, we have already seen an example where they are equivalent: the public-coin protocol for GNI. We’ll see in our next lecture that public- and private-coin protocols are equivalent. Namely, we’ll show the following.

Theorem. For all $k (n)$ computable in $poly (n)$ time, it holds that $IP [⌈ k /2 ⌉] \subseteq AM [k + 2]$ .

For example, $F = F_{2} [X] / (X^{n} + X + 1)$ . ↩

Lecture 21

In-class notes: CS 505 Spring 2025 Lecture 21

Lecture 22

In-class notes: CS 505 Spring 2025 Lecture 22

Lecture 23

In-class notes: CS 505 Spring 2025 Lecture 23

Lecture 24

In-class notes: CS 505 Spring 2025 Lecture 24

Cryptography and Complexity Theory

Recorded Lectures

Handwritten Notes

Crypto and Complexity, Parts 1 and 2

Keyboard shortcuts

CS 505 - Computability and Complexity Theory (Spring 2025)