Lecture 2

In-class notes: CS 505 Spring 2025 Lecture 2

Measuring Runtime of Turing Machines

With the definition of Turing machines established, we can turn towards quantifying the run-time of Turing machines. Informally, the run-time of a Turing machine computing some function $f$ is the maximum amount of time needed to compute $f$ on all inputs (of a fixed length), where our measure of time corresponds to how many executions of the transition function $M$ needs to utilize to compute $f$ . First, we need to actually define what it means for a Turing machine to compute a function $f$ .

Definition (Turing Machine Computation). Let $f : {0, 1}^{*} \to {0, 1}^{*}$ be a function and let $M$ be a Turing machine. We say that $M$ computes $f$ if for all $x \in {0, 1}^{*}$ , when $M$ is initialized with input $x,$ it halts with $f (x)$ on the output take. We denote this as $M (x) = f (x)$ .

Now we can define the run-time of a Turing machine computing a function $f$ .

Definition (Turing Machine Run-time). Let $f$ be a function and let $M_{f}$ be a Turing machine which computes $f$ . Furthermore, let $T : N \to N$ be a function. We say that $M_{f}$ computes $f$ in time $T$ if $M_{f} (x) = f (x)$ in at most $T (∣ x ∣)$ steps for all $x \in {0, 1}^{*}$ (i.e., $O (T (∣ x ∣))$ steps). Here, a step of the Turing machine $M_{f}$ is a single execution of its transition function $δ$ .

In essence, executing one step of the transition function of a Turing machine is the atomic Turing machine operation.

Time Constructible Functions

An important concept is the idea of time constructible functions, which we will use to quantify and show equivalences among different Turing machine models. It will also be used in later topics (e.g., time hierarchy theorems).

Definition (Time Constructible Function). Let $T : N \to N$ be a function. Then we say that $T$ is time construictible if and only if (1) $T (n) ⩾ n$ ; and (2) there exists a Turing machine $M$ such that for all $x \in {0, 1}^{*}$ , we have $M (x) = ⟨ T (∣ x ∣) ⟩_{2}$ in time $O (T (∣ x ∣))$ . Here, $⟨ T (∣ x ∣) ⟩_{2}$ denotes the binary representation of $T (∣ x ∣)$ .

Note

This is slightly different than how I presented it in class.

Examples of time constructible functions include $T (n) = ⎩ ⎨ ⎧ n n lo g (n) n^{2} 2^{n}$ Notably, $T (n) = lo g (n)$ , or any $ω (1) ⩽ T (n) ⩽ o (n)$ , are not time constructible.

The above examples of non-time constructible functions highlight a key idea behind time constructibility: a Turing machine (usually) needs to read its entire input in order to compute a function. The stipulation $T (n) ⩾ n$ allows for a Turing machine to at least read the entire input before computing $T (∣ x ∣)$ . If this restriction is removed, then $T (n) = Θ (1)$ is still time constructible (you simply ignore all inputs and write the constant to the output tape), but $ω (1) ⩽ T (n) ⩽ o (n)$ remain non-time constructible since the Turing machine is expected to compute $T (∣ x ∣)$ in less time than it takes to read $x$ !

Turing Machine Equivalences

A function $T$ being time constructible turns out to be a key factor in how we define equivalences among Turing machines (and other models as well).¹ Informally, we say that a computational model $A$ is equivalent to a computational model $B$ if any for any computation capable of being performed in $B$ can be performed in $A$ (with at most polynomial time overhead). In the context of Turing machines, we say that a computational model $A$ is equivalent to the Turing machine model if any problem solvable in time $T$ in model $A$ can be solved by a Turing machine running in time $O (T^{c})$ for constant $c > 0$ . Intuitively, if $P$ is a program in computational model $A$ running in time $T$ , then a Turing machine will simulate model $A$ in order to run program $P$ (e.g., similar to modern interpreted programming languages like Python). If $P$ runs in time $T$ , and $T$ is not time constructible, then this simulation will not meet our requirements; i.e., it will not be an efficient simulation.

As mentioned in Lecture 1, the $k$ -tape Turing machine model we have been working with is equivalent to many other Turing machine (and non-Turing machine) models. We state these relations formally below.

First, recall that it is sufficient to consider a Turing machine which only uses a binary alphabet.

Lemma 2.1. For every $f : {0, 1}^{*} \to {0, 1}$ and time constructible $T$ , if $f$ is computable in time $T$ by a Turing machine $M_{f}$ with tape alphabet $Γ$ , then it is computable in time $O (T \cdot lo g (∣Γ∣))$ by a Turing machine $M_{f}$ with tape alphabet $Γ = {0, 1, ▹, □}$ .

Proof Sketch. The main idea is to encode the (non-start and non-blank) symbols of $Γ$ using bits. This requires roughly $⌈ lo g (∣Γ∣)⌉$ bits to uniquely encode $Γ$ in binary. Then the new Turing machine $M_{f}$ simply encodes each symbol from $Γ$ on its tapes in binary. To simulate a single step of $M_{f}$ , the machine $M_{f}$ must read $lo g (∣Γ∣)$ bits from each tape, translate the symbol read into its current state, then execute $M_{f}$ ’s transition function. $□$

Next, it turns out that any $k$ -tape Turing machine can be readily simulated by a single-tape Turing machine (which many of you may have seen before).

Lemma 2.2. For every $f : {0, 1}^{*} \to {0, 1}$ and time constructible $T$ , if $f$ is computable in time $T$ by a $k$ -tape Turing machine, then it is computable in time $O (k \cdot T^{2})$ by a single-tape Turing machine.

Proof Idea. The proof idea is to stagger the $k$ tapes onto the single-tape machine. Notably, since each of the $k$ tapes is infinite, if you try to write them side-by-side on a single-tape machine, you would inevitably run into a situation where you reach the end of an allocation for a work tape, so you’d have to shift the entire contents of the remaining tapes right one space. This would blow-up the time to simulate. So instead, you stagger the $k$ tapes. Consider tape $i$ . Then positions $1, 2, 3, \dots, j, \dots$ of tape $i$ would be written to positions $i, i + k, i + 2 k, \dots, i + (j - 1) k, \dots$ on the single-tape machine. $□$

It also turns out that having tapes which are infinite in both directions does not buy you much in terms of computational efficiency.

Lemma 2.3. For every $f : {0, 1}^{*} \to {0, 1}$ and time constructible $T$ , if $f$ is computable in time $T$ by a $k$ -bidirectional tape Turing machine (i.e., every tape is infinite in both directions), then $f$ is computable in time $4 \cdot T (n)$ by a standard $k$ -tape Turing machine (i.e., tapes that are infinite in one direction).

Proof Idea. You can approach this two different ways.

Cut each bidirectional tape in half, then stagger this tape onto a single tape (similar to Lemma 2.2 above).
If the bidirectional Turing machine has tape alphabet $Γ$ , let the standard Turing machine have tape alphabet $Γ = Γ \times Γ$ . Then you can encode the bidirectional tape onto the single tape using $Γ$ . $□$

Universal Turing Machines

We’ve discussed how our $k$ -tape Turing machine is equivalent to many other Turing machine models. Next, we will see that we can simulate any Turing machine (in any equivalent model). Much like how the modern computer can run any computation you give it, we will see there is a universal Turing machine which can simulate any Turing machine you give it as input.

Turing Machines are (Binary) Strings

We’ve focused our attention on Turing machines $M$ which compute some function $f : {0, 1}^{*} \to {0, 1}^{*}$ , and we haven’t given much thought to how we write down the machine $M$ . It turns out that we can conveniently describe Turing machines simply as binary strings. We’ll let $⟨ M ⟩_{2} \in {0, 1}^{*}$ denote the binary string which represents the Turing machine $M$ . Note: there are an infinite number of strings $x \in {0, 1}^{*}$ which represent a single Turing machine $M$ .

For any $α \in {0, 1}^{*}$ , we will let $M_{α}$ denote the Turing machine specified by the string $α$ . In this light, notice that

We’ve always talked about Turing machines computing some function $f : {0, 1}^{*} \to {0, 1}^{*}$ ;
Turing machines themselves are such a function; and
Turing machines can also be inputs to these functions!

So there must be a Turing machine which can take Turing machines as input and compute the function that this Turing machine would have computed! This is the universal Turing machine.

Theorem 2.4 (Hennie & Stearns, 1966). There exists a Turing machine $U$ such that for all $α, x \in {0, 1}^{*}$ , $U (α, x) = M_{α} (x)$ . That is, $U (α, x)$ computes the output of $M_{α}$ when run with input $x$ . Moreover, if $M_{α}$ halts within $T$ steps on any input for time constructible $T$ , then $U (α, x)$ halts in $O (T \cdot lo g (T))$ steps, where the hidden constant only depends on $M_{α}$ ’s alphabet size, number of states, and number of tapes.

The proof of the above theorem can be found below (see Proof of Theorem 2.4). Here, we’ll give the proof of the above with $O (T \cdot lo g (T))$ replaced with $O (T^{2})$ .

Proof with time bound $O (T^{2})$ . Suppose that $α, x \in {0, 1}^{*}$ . Without loss of generality, we can assume that the Turing machine $M_{α}$ has tape alphabet ${0, 1, ▹, □}$ and has a single work tape (i.e., it is a $3$ -tape Turing machine). If not, then $U$ can transform $M_{α}$ into an equivalent Turing machine, denoted as $M_{α}$ , with these properties by Lemmas 2.1 and 2.2.² In this case, if $M_{α}$ runs in time $T$ , then the resulting equivalent Turing machine $M_{α}$ runs in time $O (T^{2})$ (ignoring the $lo g ∣Γ∣$ factors since $∣Γ∣$ is fixed).

The universal machine $U$ will be a $5$ -tape Turing machine; i.e., one input tape, one output tape, and 3 work tapes. $U$ has alphabet $Γ_{U} = {0, 1, ▹, □}$ . Now $U$ will simulate $M_{α}$ as follows.

$U$ uses its input, output, and first work tape to identically copy the operations $M_{α}$ performs on these tapes (recall $M_{α}$ has $3$ tapes).
$U$ encodes the state space $Q$ of $M_{α}$ on its second work tape.
$U$ encodes the transition function $δ : Q \times Γ^{3} \to Q \times Γ^{2} \times {L, R, S}^{3}$ of $M_{α}$ on its third work tape. The transition function is simply encoded as a table of key-value pairs.

In order to simulate a single step of $M_{α}$ ’s computation, the machine $U$ does the following.

Read the current symbols under the input tape, output tape, and first work tape. This identically matches what $M_{α}$ does and takes constant time.
Read the current state of $M_{α}$ from the second work tape. Since the tape alphabet is binary, the states of $M_{α}$ take $⌈ lo g ∣ Q ∣ ⌉$ bits to encode, so reading the current state takes $O (lo g ∣ Q ∣)$ time steps (i.e., move to the end of the current state, go back to the start of the work tape).
Let $q_{current}$ be the current state, and let $γ_{1}, γ_{2}, γ_{3}$ be the symbols read from the input, output, and first work tapes, respectively. Scan the third work tape for the key $(q_{current}, γ_{1}, γ_{2}, γ_{3})$ .
Once this key is found, read the value from the corresponding table entry. The value is exactly $(q^{'}, γ_{2}^{'}, γ_{3}^{'}, d_{1}, d_{2}, d_{3}) = δ (q_{current}, γ_{1}, γ_{2}, γ_{3})$ , where $d_{i} \in {L, R, S}$ for $i \in [3]$ .
Execute the transition function $δ$ of $M_{α}$ .
1. Write $γ_{2}^{'}$ to the output head and $γ_{3}^{'}$ to the head of work tape 1. This takes constant time.
2. Write the new state $q^{'}$ to the second work tape and reset the tape head after. This take $O (lo g ∣ Q ∣)$ time.
3. Move tape head $i$ direction $d_{i}$ for $i \in [3]$ . This takes constant time.
4. Move the head of the third work tape back to the start.

Now, the time complexity of (3) and (5.4) above are the same. In particular, in the worst case, $U$ must scan to the end of the table representing $δ$ to find the correct state. There are $∣ Q ∣ \cdot ∣Γ ∣^{3}$ keys in this table, and each key has an entry in $Q \times Γ^{2} \times {L, R, S}^{3}$ . Since $∣Γ∣ = 4$ and because we can encode ${L, R, S}$ with only two more bits, we can conclude that each table entry (i.e., each key-value pair) has length $O (∣ Q ∣)$ . This means to write down a single entry, we need $O (lo g ∣ Q ∣)$ bits. Moreover, there are a total of $O (∣ Q ∣)$ entries in the table, so the total time of executing (3) or (5.4) is at most $O (∣ Q ∣ lo g ∣ Q ∣)$ time.

Since $∣ Q ∣$ is fixed, to simulate a single step of $M_{α}$ on $U$ requires $O (1)$ time. So if $M_{α}$ runs in time $T^{'}$ , then $U$ runs in time $O (T^{'})$ . Now $T^{'} = O (T^{2})$ by the transformation we performed on $M_{α}$ to obtain $M_{α}$ . Thus, $U$ simulates $M_{α}$ in at most $O (T^{2})$ time. $□$

Turing Machines and Languages

We’ve spent most of our time discussing Turing machines and how they compute functions. We’ll now shift to mostly talking about Turing machines in the context of deciding languages.

Recall that a language $L$ is simply a subset of ${0, 1}^{*}$ . Notably, we can define a function $f_{L} : {0, 1}^{*} \to {0, 1}$ as $f (x) = 1$ if and only if $x \in L$ ; this immediately implies that $f (x) = 0$ if and only if $x \neq \in L$ . So there is a natural correspondence to computing functions and deciding set membership in a language $L$ .

Key to our later dealings with complexity classes will be the idea of Turing decidability. We’ll build up to this idea by first introducing Turing recognizability.

Definition (Turing Recognizable Language). A language $L$ is said to be Turing recognizable if there exists a Turing machine $M_{L}$ such that for all $x \in L$ , $M_{L} (x) = 1$ . In particular, $M_{L} (x)$ always halts and outputs $1$ if $x \in L$ .

Recognizability only requires that the Turing machine halt on any valid member of the language. If, however, one hands this Turing machine $x \neq \in L$ , its behavior is undefined and not guaranteed! We’d like to strengthen this to make sure our Turing machine always halts, whether or not its input is in the language. This gives us decidability.

Definition (Turing Decidable Language). A language $L$ is said to be Turing decidable if there exists a Turing machine $M_{L}$ such that the following hold for any $x \in {0, 1}^{*}$ :

$M_{L} (x) = 1$ if and only if $x \in L$ ; and
$M_{L} (x) = 0$ if and only if $x \neq \in L$ .

Notice that the above definition immediately means that $M_{L}$ halts on all possible inputs. This is because, equivalently stated, if $x \neq \in L$ then $x \in \overline{L}$ , where $\overline{L}$ is the complement of $L$ , which is defined as $\overline{L} = {0, 1}^{*} ∖ L$ (i.e., everything in ${0, 1}^{*}$ but not in $L$ ).

An equivalent definition of decidability states that both the language and its complement are recognizable.

Lemma 2.5. A language $L$ is Turing decidable if and only if both $L$ and $\overline{L}$ are Turing recognizable.

Undecidability

Unfortunately, there are many (interesting) languages that are undecidable; that is, there does not exist any Turing machine which decides the language. We’ll begin by showing the existence of at least one undecidable language.

Theorem 2.6. There exists a language $L_{UC}$ that not Turing decidable (i.e., it is undecidable).

Proof. First define a language $L = {α ∣ M_{α} (α) = 1}$ ; i.e., $L$ is the set of all strings $α$ such that the Turing machine $M_{α}$ , when given input its own description $α$ , halts and outputs $1$ . Now define the complement language $L_{UC} = {α ∣ α \neq \in L}$ . We claim that $L_{UC}$ is undecidable.

We show this via a proof by contradiction. So towards contradiction, assume that $L_{UC}$ is decidable. Then there exist a Turing machine $M_{UC}$ which decides this language. This implies that for any $α \in {0, 1}^{*}$ , $M_{UC} (α) = 1$ if and only if $α \in L_{UC}$ and $M_{UC} (α) = 0$ if and only if $α \neq \in L_{UC}$ .

Consider $M_{UC} (⟨ M_{UC} ⟩_{2})$ . We have that

$⟺ ⟺ ⟺ M_{UC} (⟨ M_{UC} ⟩_{2}) = 1 ⟨ M_{UC} ⟩_{2} \in L_{UC} ⟨ M_{UC} ⟩_{2} \neq \in L M_{UC} (⟨ M_{UC} ⟩_{2}) = 0 or never halts.$ Thus, we have a contradiction as $1 \neq = 0$ . This implies that $L_{UC}$ is undecidable. $□$

Notably, the above proof technique is known as diagonalization. We’ll use it later when we discuss time hierarchy theorems.

Question

Is $L_{UC}$ Turing recognizable?

Different from Class

I incorrectly stated in class that is was recognizable. However, $L_{UC}$ is, in fact, not recongizable. This is because $L$ from the above proof is recognizable. By Lemma 2.5, if $L_{UC}$ was Turing recognizable, then it would be decidable, but clearly it is not!

One may argue that the language $L_{UC}$ is not a very interesting language class, and may not come up in the real world. However, we’ll take one-step up and consider a more interesting language that would be great for us if it were decidable! Unfortunately, it is not decidable.

The Halting Problem

The Halting problem asks the following simple question: given a Turing machine $M$ , does it halt on input $x$ ? More formally, it is specified by the following language: $L_{H} = {(α, x) ∣ M_{α} (x) halts in a finite number of steps} .$

Theorem 2.7. $L_{H}$ is undecidable.

We’ll give the proof of this theorem in Lecture 3.

Proof of Theorem 2.4

This proof is taken directly from Arora & Barak’s book with the following notes:

Theorem 1.9 in the proof corresponds to Theorem 2.3 in these lecture notes;
Claim 1.6 in the proof corresponds to Lemma 2.2 in these lecture notes; and
Claim 1.5 in the proof corresponds to Lemma 2.1 in these lecture notes.

The proof can be found in the following pdf: Proof of Theorem 2.4

“Key” here meaing it makes proofs much simpler. ↩
Lemma 2.2 tells us that a $k$ -tape Turing machine can be simulated by a one-tape Turing machine with quadratic overhead. The same proof can be applied to reduce $k$ -tapes to $3$ -tapes, with a single input, output, and work tape (i.e., transform the $k - 2$ work tapes into a single work tape, keep the input/output tapes the same). ↩

Keyboard shortcuts

CS 505 - Computability and Complexity Theory (Spring 2025)