Lecture 14

In-class notes: CS 505 Spring 2025 Lecture 14

Randomized Computations

So far, we have only examined deterministic and non-deterministic Turing machines. Crucially, neither of these models for Turing machines utilize randomness. This is obvious for deterministic Turing machines, but it is very important to understand that non-determinism is not randomness, it is an idealized computation model that is not realistic.

Today, we’ll consider probabilistic Turing machines. These Turing machines will be allowed to sample uniformly and independently random bits $b \leftarrow $ {0, 1}$ and use these bits to help make decisions. Here, uniformly random means that $Pr [b = 0] = Pr [b = 1] = 1/2$ , and independent means that the probability $b$ is $0$ or $1$ does not depend on any previous bits sampled or decision made by the algorithm.

Definition. A probabilistic Turing machine (PTM) is a Turing machine with two transition functions $(δ_{0}, δ_{1})$ . For all $x \in {0, 1}^{*}$ , a PTM $M$ on input $x$ does the following.

For each step of the computation, $M (x)$ samples $b \leftarrow $ {0, 1}$ .
Execute $δ_{b}$ .

We let $M (x)$ denote the random variable corresponding to PTM $M$ ’s output on input $x$ given the random choices it makes during its execution. We say that $M$ runs in time $T$ if for all $x$ , $M (x)$ halts in $O (T (∣ x ∣))$ steps for any set of random choices.

Deciding Languages: PTMs vs NTMs

As stated above, PTMs and NTMs are not the same. Recall that for NP, we say that an NTM $N$ accepts a string $x$ if there exists an execution of $N (x)$ such that $N (x) = 1$ . Similarly, for coNP, the NTM $N$ accepts string $x$ if all executions of $N (x)$ satisfy $N (x) = 1$ . Above, these definitions are quantified with respect to any possible non-deterministic choices the NTM $N$ make during its execution.

Now, for a PTM, say $M$ , we can similarly define it with respect to the number of accepting paths. That is, we can say that $M$ accepts a string $x$ if some fraction of all executions $M (x)$ satisfy $M (x) = 1$ . For example, we can specify that if $M (x) = 1$ for at least half of all computation paths (that is, looking at every possible set of computation paths $M$ could take using randomness), then we say $M$ accepts $x$ . This gives us a natural definition for deciding a language with a PTM.

Definition. Let $L$ be a language and $T : N \to N$ be a function. We say that a PTM $M$ decides $L$ in time $T$ if $M (x)$ halts in time $O (T (∣ x ∣))$ for any $x$ and $Pr [M (x) = L (x)] \geq 2/3,$ where the probability is taken over the random choices made by $M$ on input $x$ , and $L (x) = 1$ if $x \in L$ and $L (x) = 0$ if $x \in / L$ .

In the above definition, decidability with respect to a PTM has two-sided error, meaning that $M (x)$ outputs correctly with probability at least $2/3$ for both $x \in L$ and $x \in / L$ . This leads us to our first probabilistic complexity class: BPP.

Definition. For a function $T : N \to N$ , we say that a language $L \in BPTIME (T)$ if there exists a PTM $M_{L}$ which decides $L$ in time $T$ . The complexity class BPP is defined as $BPP = c \in N ⋃ BPTIME (n^{c}) .$

Note that BPP is still a worst-case class since we require deciding languages on a PTM in strict polynomial time.

BPP vs. Other Classes

Intuitively, like how NP was the non-deterministic analogue of P, BPP is the randomized analogue of P. In fact, we know that $P \subseteq BPP \subseteq EXP$ . To see the first inequality, notice that every deterministic algorithm is a randomized algorithm that uses no randomness (e.g., you just pick $δ_{0} = δ_{1} = δ$ , where $δ$ is the transition function of the DTM). For the second inequality, suppose that $M$ is a PTM running in time $O (n^{c})$ . Then $M$ has $2^{O (n^{c})}$ possible computation paths (each step picks 0/1 with equal probability). So we can construct a machine $M^{'}$ running in time $O (2^{n^{c}})$ which enumerates all possible computation paths of $M$ and outputs $1$ if and only if at least 2/3 of the paths are accepting.

Now, it is unknown whether $BPP ⊊ NEXP$ or if $BPP = P$ . For the second statement, most complexity theorists actually believe that $BPP = P$ , but this is a topic we will not cover in this course.

This is not the last time we will see how BPP relates to other classes; we will return to this topic in later lectures.

Alternate Definition of BPP

Taking inspiration from the above outline of $BPP \subseteq EXP$ , we can give an alternative definition of BPP similar to the certificate definition of NP.

Definition. A language $L$ is in the class BPP if there exists a polynomial-time deterministic Turing machine $D$ and a polynomial $p$ such that for all $x$ , $r \leftarrow $ {0, 1}^{p (∣ x ∣)} Pr [D (x, r) = L (x)] \geq 2/3.$

3 Examples of BPP Algorithms

We’ll now give 3 examples of randomized algorithms.

Finding the Median

Given a set of $n$ numbers $S = {a_{1}, \dots, a_{n}}$ , the median of $S$ is the number $x$ such that $x \geq s$ for $⌊ n /2 ⌋$ elements $s \in S$ and $s \leq t$ for $⌊ n /2 ⌋$ elements $t \in S$ . Stated as a decision problem, you’d be given $(S, m)$ and would need to decide if $m$ is the median of $S$ .

There is an easy deterministic algorithm that requires $O (n lo g (n))$ time to check.

Sort $S$ and obtain $S = (s_{1}, \dots, s_{n})$ , where $s_{i} \leq s_{i + 1}$ for all $i$ .
If $n$ is even, set $x = (s_{n /2} + s_{1 + n /2}) /2$ (more generally, any number in the range $[s_{n /2}, s_{1 + n /2}]$ ). If $n$ is odd, set $x = s_{⌈ n /2 ⌉}$ .

This takes $O (n lo g (n))$ time since we sort $S$ .

Now there is a $O (n)$ time deterministic algorithm for finding the median and, more generally, the $k$ th smallest element. However, the algorithm is highly non-trivial.

To contrast, we will give an $O (n)$ time simple randomized algorithm to find the $k$ th smallest element, where an element $t \in S$ is the $k$ th smallest if $k - 1$ $s \in S$ satisfy $s \leq t$ and $n - k - 1$ $s^{'} \in S$ satisfy $t \leq s^{'}$ . In particular, setting $k = ⌊ n /2 ⌋$ gives us the median problem. Let $S = {a_{1}, \dots, a_{n}}$ .

Find $k$ thElement $(k, S)$ :
1. Pick $i \leftarrow $ [∣ S ∣]$ . Set $x = a_{i}$ .
2. Scan $S$ and let $m$ denote the number of elements $s \in S$ such that $s \leq x$ .
3. If $m = k$ , output $x$ .
4. If $m > k$ :
  - Create set $S^{'} = {c \in S ∣ c \leq x}$ .
  - Run Find $k$ thElement $(k, S^{'})$ .
5. If $m < k$ :
  - Let $T = {c \in S ∣ c \geq z}$ .
  - Run Find $k$ thElement $(k - m, T)$ .

We now give the intuition on why this algorithm runs in $O (n)$ time for initial list size $n$ .

The deterministic parts of the algorithm run in linear time.
The expected sizes of $S^{'}$ and $T$ can be shown to be at most $9 n /10$ . That is, with good/high probability over the random choice of $i$ , $∣ S^{'} ∣ \leq 9 n /10$ and $∣ T ∣ \leq 9 n /10$ (where the “ $n$ ” is updated every recursive call to mean $∣ S ∣$ ).
Together, this implies that if $T (n)$ is the runtime of the algorithm, then $T (n) = O (n) + T (9 n /10)$ , which can be shown to imply that $T (n) = O (n)$ .

Polynomial Identity Testing

Polynomial identity testing is very common in many areas of theoretical computer science and cryptography. Consider an $n$ variate polynomial with integer coefficients $p (X_{1}, \dots, X_{n}) \in Z [X_{1}, \dots, X_{n}]$ We define the degree of $p$ to be $deg (p) = \sum_{i = 1}^{n} deg (X_{i}, p)$ , where $deg (X_{i}, p)$ is the largest degree of $X_{i}$ in $p$ . For example, if $p (X_{1}, X_{2}, X_{3}) = X_{1}^{100} + X_{2}^{2} X_{3}^{50} + X_{1}^{3} X_{2}^{5} X_{3}^{7}$ , then $deg (p) = 100 + 5 + 50 = 155$ .

The most natural identity to test is whether $p$ is the identically zero polynomial. That is, $p (y_{1}, \dots, y_{n}) = 0$ for all $y_{1}, \dots, y_{n} \in Z$ . Note that this isn’t even checkable in polynomial time. One can give a $p$ with a small description but, once expanded, contain $O (2^{n})$ terms! An example of one such polynomial is $p (X_{1}, \dots, X_{n}) = \prod_{i = 1}^{n} (c_{i} - X_{i})$ for some constants $c_{i} \in Z$ .

However, there is a probabilistic algorithm which can efficiently check if $p$ is identically zero, assuming that evaluating $p$ at any single point is efficient (i.e., polynomial-time).

$ZeroCheck (p)$ :

Sample $k \leftarrow $ [2^{2 n}]$ .
Sample uniformly random $α_{1}, \dots, α_{n} \leftarrow $ [10 \cdot 2^{n}]$ .
Check if $p (α_{1}, \dots, α_{n}) mod k = 0$ . Output $1$ if this check passes, and $0$ otherwise.

Observations. Let $y = p (α_{1}, \dots, α_{n})$ .

If $y = 0$ then $y mod k = 0$ for any $k$ .
Suppose that $y \neq = 0$ . We want to analyze the probability that $Pr_{k} [y mod k = 0∣ y \neq = 0]$ . This is equal to the probability that $k$ divides $y$ . Suppose that $y$ has prime factors $p_{1}, \dots, p_{ℓ}$ . We can upper bound the probability that $k$ divides $y$ by the probability that $k$ is any one of these prime numbers. By the Prime Number Theorem, the number of primes in $[2^{2 n}]$ is at least $2^{2 n} /2 n$ . Now, one can show that $lo g (y) \leq 5 n \cdot 2^{n} = o (2^{2 n} /2 n)$ . This implies that the number of elements of $[2^{2 n}]$ not equal to $p_{1}, \dots, p_{ℓ}$ is at least $2^{2 n} /4 n$ . So, with probability at least $δ = 1/4 n$ , the randomly chosen $k$ will not be equal to any of $p_{1}, \dots, p_{ℓ}$ . This gives us $k Pr [y mod k = 0∣ y \neq = 0] \leq (1 - 1/4 n) .$
Now suppose that $y = 0$ but $p$ is not the identically zero polynomial. This means that $α_{1}, \dots, α_{n}$ is a root of the polynomial $p$ . By the Schwartz-Zippel lemma, this implies that $α_{1}, \dots, α_{n} Pr [y = 0∣ p \neq = 0] \leq \frac{deg ( p )}{∣ [ 10 \cdot 2 ^{n} ] ∣} .$

All together, these observations imply that $Pr [ZeroCheck (p) = 1∣ p \neq = 0] \leq k Pr [y mod k = 0∣ y \neq = 0] + α_{1}, \dots, α_{n} Pr [y = 0∣ p \neq = 0] \leq (1 - \frac{1}{4 n}) + \frac{deg ( p )}{10 \cdot 2 ^{n}} .$

Verifying Matrix Multiplications

Let $p \in Z^{+}$ be a prime number and let $Z_{p}$ be the set of integers modulo $p$ . Fix three $n \times n$ matrices $A, B, C \in Z_{p}^{n \times n}$ . We want to decide if $C = A \cdot B$ .

The fastest known matrix multiplication algorithm runs in time $O (n^{2.37286})$ (and, I believe, is not possible to be run on a real computer because the hidden constant is enormous). There is also the trivial algorithm which takes $O (n^{3})$ time; here, we’re actually counting the number of operations, so this algorithm takes roughly $O (n^{3})$ multiplications and additions over $Z_{p}$ .

We can use randomness to get an $O (n^{2})$ -time algorithm to verify if $C = A \cdot B$ . The algorithm operates as follows.

Sample $r \leftarrow $ Z_{p}$ .
Create vector $x = (1, r, r^{2}, \dots, r^{n - 1})^{⊤} \in Z_{p}^{n \times 1}$ .
Check if $C x = A (B x)$ . Output $1$ if and only if this check passes.

Notice that computing $C x$ takes $O (n^{2})$ time. Similarly, computing $y = B x$ takes $O (n^{2})$ time, and $z = A y$ also takes $O (n^{2})$ time, so the algorithm runs in $O (n^{2})$ time.

Now, if $C = A B$ , then $C x = A (B x)$ for any choice of $x$ , and thus with probability $1$ if $C = A B$ then $C x = A (B x)$ . What if $C \neq = A B$ ? This basically reduces to $n$ polynomial equality checks (or, equivalently, $n$ zero-checks). Let $C_{i}$ be the $i$ th row of $C$ . Then, if we set $c = C x$ , we have that $c_{i} = ⟨ C_{i}, x ⟩$ . Recall that $x = (1, r, r^{2}, \dots, r^{n - 1})$ , so this inner product looks like $c_{i} = \sum_{j = 1}^{n} C_{i, j} r^{j - 1}$ , which looks exactly like evaluating a degree $n - 1$ polynomial, say $C_{i} (X) = \sum_{j = 1}^{n} C_{i, j} X^{j - 1}$ , at a random point $r$ .

Now, if we set $M = A \cdot B$ , suppose that in row $i$ , we have $C_{i} \neq = M_{i}$ . Then, our random check $r$ is actually testing the following: $r Pr [C_{i} (r) - M_{i} (r) = 0∣ C_{i} \neq = M_{i}] \leq \frac{n - 1}{p} .$ In particular, the above is true if $C_{i}$ and $M_{i}$ differ on a single entry! So this shows that $r Pr [C x \neq = A (B x) ∣ C \neq = A B] \geq 1 - \frac{n - 1}{p} .$

One-Sided Error

Interestingly, the above algorithms all have stronger guarantees than just BPP gives. Finding the median always outputs the correct answer, but runs in expected linear time. Both polynomial identity testing and verifying matrix multiplications will answer “YES” with probability $1$ when the polynomial given as input is identically zero and when $A B = C$ for the given matrices, but both of these algorithms will output “YES” with small probability (not equal to $0$ ) when the input does not have the correct form.

Building on the latter two algorithms, this lets define two “stronger” classes than BPP which have one-sided error.

Definition. The class $RTIME (T)$ is the set of all languages $L$ that are decidable by a PTM $M$ running in time $T$ such that for all $x$ $x \in L x \in / L ⟹ Pr [M (x) = 1] \geq 2/3; ⟹ Pr [M (x) = 0] = 1.$ We define the class RP as $RP = c \in N ⋃ RTIME (n^{c}) .$

Naturally, we have the co-class of RP, called coRP defined as $coRP = {L : \overline{L} \in RP}$ . Equivalently, it is also a one-sided error class with the opposite guarantees as RP.

Definition. The class $coRTIME (T)$ is the set of all languages $L$ that are decidable by a PTM $M$ running in time $T$ such that for all $x$ $x \in L x \in / L ⟹ Pr [M (x) = 1] = 1; ⟹ Pr [M (x) = 0] \geq 2/3.$ We define the class coRP as $coRP = c \in N ⋃ coRTIME (n^{c}) .$

Keyboard shortcuts

CS 505 - Computability and Complexity Theory (Spring 2025)