Lecture 5
In-class notes: CS 505 Spring 2025 Lecture 5
Cook-Levin Theorem Wrap-Up
Recall from last time, we are trying to prove that is -complete. To do so, we considered the single-tape non-deterministic Turing machine definition of . Our goal is to show that for any language , we have . That is, is poly-time reducible to .
From last time, we were able to construct an table which encoded the execution of an NTM deciding on input . From this table, we constructed the Boolean formula
The last thing to show is that our definition of correctly captures the correctness of the NTM deciding . Recall that was defined with respect to windows in the table, and it tried to capture the notion of a legal window. That is,
Claim. If the table has a correct starting configuration, and all windows are legal, then is a correct transition from for all .
Proof. To prove the claim, first consider any such . Let and be the and rows of the table. Call the upper configuration and the lower configuration.
Consider all windows for . That is, we look at all windows in the upper and lower configuration. We now define when window is legal. Legal windows fall into two categories: windows which contain a state and those which do not.
-
No state in the window. Suppose window contains no state. Then we say that is legal if and only if the two elements in the center column are equal. The window below is an example. Note that even though in the above example, the first column has then , this would be a legal window because it is possible the tape head is just to the left of in the upper configuration, writes over , then moves left.
-
State in the window. Suppose that window contains a state. Then window is legal if and only if the upper and lower configuration in this window is consistent with the transition function of the Turing machine. In particular, by our construction of the table and since the NTM is a single-tape NTM, a state in the window represents the current position of the tape head. First, we know that when transitioning from the upper configuration to the lower configuration, the state can move at most one position (left, right, or stay). This is easy to check for. Then we know that in the table, the tape head only touches the cell immediately to its right. That is, if the state is in , then the tape head is reading from/writing to . In a nutshell, the computation of a Turing machine is highly local: it can’t jump large distances in a single time-step. Examples of legal windows are given below.
-
Special windows. There are two special windows in any pair of upper and lower configurations: and . These represent the edges of the table. These windows are legal if and only if: (1) they satisfy both of the above constraints; and (2) they have the fixed symbol on the edges. See the examples below.
By the above notion of legal windows, if all windows in the upper and lower configuration are legal, then it represents a correct transition to from . Inductively, this means that if we start with a correct starting configuration, and every window in the table is legal, then each pair of upper and lower configurations represents a valid transition from to , and hence the table correctly captures the computation of the decider for language .
We conclude by giving the Boolean formula for . To do so, we simply need to give a Boolean formula for the statement “.” Define the set as follows. Here, recall that is the cell alphabet of our table.
Given this set , the Boolean formula for the statement “” is expressed as What is this formula saying? It says that given a tuple , which I know by the definition of represents some legal window, is the current window this legal window? That is, it is asking if is true. We take a big OR over all legal windows to make sure that window is some legal window.
If this big OR is true, then we know is some legal window. This gives us the final expression for as All together, we have that is satisfiable if and only if the NTM we are encoding in the table is accepting.
The final piece of the puzzle is arguing that we can construct in polynomial time. Note that the cell alphabet is of constant size with respect to the input length by definition of Turing machines.
- For , given an input to the NTM for deciding the language , the starting configuration of the machine is fixed. Thus, the starting row of the table is fixed as well. For , the starting row of the table contains cells, which corresponds to literals in . This can clearly be constructed in time.
- For , recall that we are simply scanning the entire table for an accepting state. The table has total size , so this formula clearly has size and can be constructed in time.
- For , it is a big AND of pairs . Within this big AND, we have two constant sized subformulas. First, the formula checking that cell contains a valid symbol . Since is constant, the size of this formula is constant. Then this subformula is AND’d with a big AND of an OR which checks that cell doesn’t contain both symbol and . Again, since is constant, this subformula is constant. So the total size of is and can be constructed in this much time as well.
- Similarly, for , the size of the set is at most , which is constant size (something like ) since and are constants. So the inner formula is a constant size, whereas the whole formula is a big AND of pairs from to at most . So has size and can be constructed in this much time.
This completes the proof of the Cook-Levin theorem.
Other NP-Complete Problems
SAT is a step-up from the (useless) -Complete problem TMSAT. However, a general Boolean formula (like those given in SAT) may be difficult to handle when trying to understand specific problems. Thus, we turn our attention to the wide variety of other -complete problems.
First, we show that given any -complete problem/language, if we want to show some other language is -complete, we only need to reduce our known -complete language to our new language.
Theorem 5.1. If is an -complete language, and such that , then is -complete.
Proof. Recall the transitive property of polynomial-time reducible languages. Let be languages such that and . Then we know that .
By our assumption, is -complete. This means that and for all . By our other assumption, we know that . By the transitive property above, we now know that for any . Thus, is -complete.
Now, rather than having to do a complete Cook-Levin Theorem style proof for new languages we want to show are -complete, it suffices to just reduce from a language we know is -complete!
3SAT
We turn to our next (and possibly favorite) -complete language: 3SAT. First, we need to set up some terminology.
Let be a Boolean formula. We say that is in conjunctive normal form (or is a CNF formula) if such that only contains ORs of literals/variables (and their negations). We call each a clause of . One example of a CNF formula with clauses is given below.
We say that is a -CNF formula if each clause contains exactly literals. An example of a -CNF formula with clauses is given below.
Definition (3SAT). The language is the set of all satisfiable -CNF formulas. That is,
Complexity theorists prefer over other NP-complete languages since it is simple, has very little combinatorial structure, and occurs in many differnt contexts such as constraint satisfaction problems.
The other part of the Cook-Levin Theorem (that I hid from you earlier) is that is NP-complete.
Theorem (Cook-Levin, Part 2). is -complete.
Proof. is immediate. What remains to be shown is that is NP-hard. We could show that using our above theorem, but it is actually simpler just to modify the proof of the Cook-Levin theorem directly to give us a 3-CNF formula.
Recall from the proof of the Cook-Levin theorem. First, we will change slightly so that it is a CNF formula (we are almost there already). Once we have put in CNF form, we will then transform it into a -CNF formula.
Since is the AND of clauses, we just need to make sure that each of these clauses is an OR of 1 or more literals. First consider . Recall that it was simply ANDing literals together: So is already a CNF formula with clauses of single literals (there are no ORs).
Now consider . Remember that this is simply a big OR over the entire table, checking if there is at least one accepting state. So will be a single clause of our CNF formula.
Now consider . From before, this is given by Notice that is already in CNF form. The big OR over is a single clause, which gets AND’d with the formula , which is itself a CNF formula. Then all of these formulas are AND’d together, meaning the final formula is in CNF form.
Finally, for , it is a big AND of a big OR of a constant number of ANDs (6 ANDs). Using Boolean equivalences, we can convert the inner formula 1 into a new formula where it is a big AND of some (again constant) number of ORs. This conversion increases the formula size by at most a polynomial-time factor, so is still of polynomial size. All together, this transforms into CNF form in polynomial time.
This together establishes that we can convert our original formula into an equivalent CNF formula, say for some such that each is the OR of one or more literals. We now convert into a -CNF. This can be done as follows. For any , consider the clause .
- If has 3 literals, we are done and can move on to the next clause.
- If has less than 3 literals, we transform it into an equivalent formula with exactly 3 literals. For example, if has one literal, say , we simply write . If has two literals, say and , we pick one of the literals arbitrarily (e.g., always pick the first one) and repeat it, giving . Clearly and are equivalent.
- If has more than 3 literals, we will split into a -CNF formula using extra variables.
For example, if , we introduce the variable and convert to formula
This conversion has the property that if has a satisfying assignment, then there exists an assignment to the variable such that is also satisfied by plus the assignment for .
In our above example, the vector is a satisfying assignment for , and so a satisfying assignment for would be (set ).
In general, if has literals, we introduce new variable and transform into a 3CNF formula with clauses. If has literals , then we construct as Clearly, can be constructed in polynomial time from .
All together, this gives us our new 3CNF formula that is equivalent to the formula we constructed in the Cook-Levin Theorem.
Independent Set
The independent set problem on an undirected graph asks if there exists a set of node/vertices of size at least such that they are pairwise disconnected. That is, for every , . As a set, this is written as
Theorem. is NP-complete.
Proof. Clearly . To see this, one can simply specify a set of size at least . Then, verification that is an independent set takes at most time per pair of since in the worst case you must scan the entire set per check. So the total time is at most in the worst case, where .
Now we show that is NP-complete. We do this by giving a reduction from . That is, .
Suppose that is a 3CNF formula with clauses: . Assume that has literals (note their negation are also literals). For each clause , write , where are the literals of clause . (For example, if , then , , and .)
For each clause , we create a cluster of nodes/vertices in a graph . Label each vertex in this cluster with , , . This gives us clusters of nodes each, where each cluster of nodes is associated with and labeled .
Now we connect nodes in this graph with vertices. First, create a triangle in each cluster. That is, for each , connect , , and (note the graph is undirected so , and are also edges). Next, we connect each node with its negation. For example, if and , then we add the edge to the graph. We claim that the given 3CNF is satisfiable if and only if our constructed graph above has an independent set of size .
First, suppose that has a satisfying assignment. Let be the satisfying assignment. That is, is our satisfying assignment for . From , we build a -independent set in the graph . Now since is satisfied, the assignment satisfies every clause of . Since every clause is satisfied, as least one of its literals , , or is equal to . For each literal , choose only one of its satisfied literals and add it to the set . For example, if , , and under assignment , then add or to (but not both; simply choose one of them).
We claim this set constructed in this manner is an independent set of size . First, suppose that a literal for some (e.g., ). By construction of the graph , we know that is connected to both and . But by our selection of the set , we only choose a single node from each cluster to add to the set. So . Now what if literals for ? By construction of , we know that and are connected if and only if is the negation of the literal represented by . For example, and would be connected in the graph. However, by assumption, the assignment is a satisfying assignment. This means that if were satisfied (i.e, ), then would not be satisfied (i.e., ). So every element satisfies the property that . Thus, is an independent set of size .
Now suppose that our constructed graph has an independent set of size . We reconstruct a satisfying assignment for the formula . Let be the -independent set. Then for every , we know that . We construct our satisfying assignment directly from the set . Suppose . Then set literal in to be . For example if , then we set in the satisfying assignment (that is, ). We claim this is a satisfying assignment. This is by construction of the graph .
- Suppose and is in cluster . Then we know is connected to every other node in its cluster ( are all connected in cluster ). So we know that the other nodes in this cluster are not in (otherwise it would not be an independent set).
- Let such that is in cluster and is in cluster . We know that are not connected. This implies that and are not a literal and their negation (e.g., and is not possible). This means that we don’t obtain an assignment which sets both a literal and its negation to .
Thus, our constructed assignment is satisfying. This completes the proof.
-
Please see the actual definition of for the correct formula here. I am just using a shorthand for demonstration. ↩