73 minute read

1. Turing Machines @ Automata by Jeff UllmanPermalink

1.1 IntroductionPermalink

Questions:

  • How to show that certain tasks are impossible, or in some cases, possible but intractable?
    • Intractable means solvable but only by very slow algorithms
  • What can be solved by computer algorithms?

Turing machine (TM):

  • In some sense an ultimate automaton.
  • A formal computing model that can compute anything we can do with computer or with any other realistic model that we might think of as computing.
  • TMs define the class of recursively enumerable languages which thus the largest class of languages about which we can compute anything.
    • There is another smaller class called recursive languages that can be thought of as modeling algorithms–computer programs that answer a particular question and then finish.

1.2 DataPermalink

Data types:

  • int, float, string, etc.
  • But at another level, there is only one type string of bit.
    • It is more convenient to think of binary strings as integers.
  • Programs are also strings and thus strings of bits.
    • The fact that programs and data are at heart the same thing is what let us build a powerful theory about what is not computable.

“The ith data”:

  • Integers 1,2,3,
  • “Data” can also be mapped to 1,2,3, (with some tricks from coding or cryptography)
    • Then it makes sense to talk about “the ith data”
  • E.g. “the ith string”:
    • Text strings (ASCII or Unicode) Binary strings encode Integers
      • A small glitch: If you think simply of binary integers, then
        • 101 “the 5th string”
        • 0101 “the 5th string”
        • 00101 “the 5th string”
    • This can be fixed by prepending a 1 to the string before converting to an integer. Thus,
      • 101 “the 13th string” (1101=13)
      • 0101 “the 21st string” (10101=21)
      • 00101 “the 37th string” (100101=37)
  • E.g. “the ith proof”:
    • A formal proof is a sequence of logical expressions.
      • We can encode mathematical expressions of any kind in Unicode.
      • Convert expression to a binary string and then an integer.
    • Problems:
      • A proof is a sequence of expressions, so we need a way to separate them.
      • Also, we need to indicate which expressions are given and which follow from previous expressions.
    • Quick-and-dirty way to introduce special symbols into binary strings:
      1. Given a binary string, precede each bit by 0. E.g.:
        • 101 becomes 010001.
      2. Use strings of two or more 1’s as the special symbols. E.g.:
        • Let 111 mean “start of expression”
        • Let 11 mean “end of expression”
    • E.g. 111010001111110000010111=101,0011

1.3 Orders of infinityPermalink

  • There aren’t more “data” than there are integers.
  • While the number is infinite, you may aware that there are different orders of infinity, and the integers is of the smallest one.
    • E.g. there are more real numbers than there are integers, so there are more real numbers than there are “programs” (think of programs as data).
      • This immediately tells you that some real numbers cannot be computed by programs.

1.3.1 Finite setsPermalink

  • A finite set has a particular integer that is the count of the number of members–the cardinality.
    • E.g. card({a,b,c})=3
  • It is impossible to find a 1-to-1 mapping between a finite set and a proper subset of itself.

1.3.2 Infinite SetsPermalink

  • Formally, an infinite set is a set for which there is a 1-to-1 mapping between itself and a proper subset of itself.
    • E.g. the positive integers {1,2,3,} is an infinite set.
      • There is a 1-to-1 mapping 12,24,36, between this set and a proper subset (the set of even integers).

1.3.3 Countable SetsPermalink

  • A countable set is a set with a 1-to-1 mapping with the positive integers.
    • Hence, all countable sets are infinite.
  • E.g. set of all integers:
    • 01
    • x2x
    • +x2x+1
    • Thus, order is 0,1,1,2,2,3,3,, i.e.
      • 1st integer is 0
      • 2nd integer is 1
      • etc.
  • E.g. set of binary strings, set of Java programs.
  • E.g. pairs of integers
    • Order the pairs of positive integers first by sum, then by first component: [1,1],[2,1],[1,2],[3,1],[2,2],[1,3],[4,1],[3,2],,[1,4],[5,1],
    • [i,j]f(i,j)=(i+j)(i+j1)2i+1=(i+j1)(i+j2)2+j

1.3.4 EnumerationsPermalink

  • An enumeration of a set is a 1-to-1 mapping between the set and the positive integers.
  • Thus, we have seen enumerations for strings, programs, proofs, and pairs of integers.

1.3.5 Set of all languages over some fixed alphabetPermalink

Question: Are the languages over {0,1} countable?

Answer: No. Proof by Diagonalization (a table of m strings × n languages):

Suppose we could enumerate all languages over {0,1} and talk about “the ith language.”

Consider the language:

L={w|wis theithbinary string andwis not in theithlanguage}

Clearly, L is a language over {0,1}.

Thus, L is the jth language for some particular j.

Let x be the jth string. Is x in L?

  • If so, x is not in L by definition of L.
  • If not, then x is in L by definition of L.

We have a contradiction: x is neither in L nor not in L, so our sole assumption (that there was an enumeration of the languages) is wrong.

This is really bad; there are more languages than programs:

  • The set of all programs are countable
  • The set of all languages are not countable

1.4 Turing-MachinePermalink

Turing-Machine Theory:

  • One important purpose of the theory of Turing machines is to prove that certain specific languages have no membership algorithm.
  • The first step is to prove certain languages about Turing machines themselves have no membership algorithm.
  • Reductions are used to prove more common questions undecidable.

Picture of a Turing Machine:

  • Finite number of states.
  • Infinite tape with cells containing tape symbols chosen from a finite alphabet.
    • Tape head: always points to one of the tape cells.
  • Action: based on the state and the tape symbol under the head, a TM can
    • change state
    • overwrite the current symbol
    • move the head one cell left or right

Why Turing Machines? Why not represent computation by C programs or something like that?

  • You can develop a theory about programs without using Turing machines, but it is easier to prove things about TM’s, because they are so simple.
  • And yet they are as powerful as any computer.
    • In fact, TMs are more powerful than any computer, since they have infinite memory
      • Yes, we could always buy more storage for a computer.
      • But the universe is finite so where are you going to get the atoms from which to build all of those discs?
      • However, once you accept that the universe is finite, the limitation doesn’t seem to be effecting what we can compute in practice.
      • So we are not going to argue that a computer is weaker than a Turing machine in a meaningful way.

1.4.1 Turing-Machine FormalismPermalink

A TM is typically described by:

  1. A finite set of states, Q
  2. An input alphabet, Σ
    • symbol 和 alphabet 的关系是 symbolalphabet
  3. A tape alphabet, Γ
    • ΣΓ
  4. A transition function, δ
  5. A start state, q0Q
  6. A blank symbol, BΓΣ
    • All tape except for the input is blank initially.
    • B is never an input symbol.
  7. A set of final states, FQ

Conventions:

  • a,b, are input symbols.
  • ,X,Y,Z are tape symbols.
  • ,w,x,y,z are strings of input symbols.
  • α,β, are strings of tape symbols.

Transition Function δ(q,Z)=(p,Y,D):

  • Arguments:
    1. Current state qQ
    2. Current tape symbol ZΓ
  • Return value (when δ(q,Z) in not undefined):
    • p is a state
    • Y is the new tape symbol (to replace the current symbol on the tape head)
    • D is a direction, L or R

TM Example:

  • Description:
    • This TM scans its input (all binary bits) left to right, looking for a 1.
    • If it finds 1, it changes it to a 0, goes to final state f, and halts.
    • If it reaches a blank, it changes it to a 1 and moves left.
  • Formalism:
    • States Q={q(start),f(final)}.
    • Input symbols Σ={0,1}.
    • Tape symbols Γ={0,1,B}.
    • δ(q,0)=(q,0,R).
    • δ(q,1)=(f,0,R).
    • δ(q,B)=(q,1,L).
  • Example tape: BB00BB; starts from the first 0
    • BB00BB, δ(q,0)=(q,0,R), move right
    • BB00BB, δ(q,0)=(q,0,R), move right
    • BB00BB, δ(q,B)=(q,1,L), move left
    • BB001B, δ(q,0)=(q,0,R), move right
    • BB001B, δ(q,1)=(f,0,R), move right
    • BB000B, undefined, halt.

Instantaneous Descriptions of a Turing Machine:

  • Initially, a TM has a tape consisting of a string of input symbols surrounded by an infinity of blanks in both directions.
  • The TM is in the start state, and the head is at the leftmost input symbol.
  • TM ID’s: (我觉得更合适的名字应该叫 TM State ID)
    • An ID is a string αqβ, where
      • α= string from the leftmost nonblank to tape head (exclusive)
      • β= string from the tape head (inclusive) to the rightmost nonblanks.
        • I.e. the symbol to the right of q is the one being scanned.
      • If an ID is in the form of αq, it is scanning a B.
    • As for PDA’s (Pushdown automaton) we may use symbols
      • IDiIDj to represent “IDi becomes IDj in one move” and
      • IDiIDj to represent “IDi becomes IDj in zero or more moves,”。
    • Example: The moves of the previous TM are q000q000q0q0100q1000f
  • Formal Definition of Moves:
    1. If δ(q,Z)=(p,Y,R), then
      • αqZβαYpβ
      • If Z is the blank B, then αqαYp
    2. If δ(q,Z)=(p,Y,L), then
      • For any X, αXqZβαpXYβ
      • In addition, qZβpBYβ

1.4.2 Languages of a TMPermalink

There are actually two ways to define the language of a TM:

  • A TM defines a language by final state.
    • L(M)={w|q0wI,where I is an ID with a final state}
  • Or, a TM can accept a language by halting.
    • H(M)={w|q0wI,and there is no move possible from IDI}

Equivalence of Accepting and Halting:

  1. If L=L(M), then there is a TM M such that L=H(M).
  2. If L=H(M), then there is a TM M such that L=L(M).

Proof:

Final State => Halting:

  • Modify M to become M as follows:
    1. For each final state of M, remove any moves, so M halts in that state.
    2. Avoid having M accidentally halt.
      • Introduce a new state s, which runs to the right forever; that is δ(s,X)=(s,X,R) for all symbols X.
      • If q is not a final state, and δ(q,X) is undefined, let δ(q,X)=(s,X,R).

Halting => Final State:

  • Modify M to become M as follows:
    1. Introduce a new state f, the only final state of M.
    2. f has no moves.
    3. If δ(q,X) is undefined for any state q and symbol X, define it by δ(q,X)=(f,X,R).

Recursively Enumerable Languages:

  • We now see that the classes of languages defined by TM’s using final state and halting are the same.
  • This class of languages (that can be defined in both those ways) is called the recursively enumerable languages.
    • Why this name? The term actually predates the Turing machine and refers to another notion of computation of functions.

Recursive Languages:

  • A proper subset of Recursively Enumerable Languages.
  • An algorithm formally is a TM accepting (a language) by final state that halts on any input regardless of whether that input is accepted or not.
    • 注意这里的逻辑,algorithm 是 TM 而不是 language.
  • If L=L(M) for some TM M that is an algorithm, we say L is a recursive language.
    • In other words, the language that is accepted by final state by some algorithm is a recursive language.
    • Why this name? Again, don’t ask; it is a term with a history.
  • Example: Every CFL (Context-free language) is a recursive language.
    • By the CYK algorithm.

2. Decidability @ Automata by Jeff UllmanPermalink

Central Ideas:

  • TMs can be enumerated, so we can talk about “the ith TM”.
    • Thus possible to diagonalize over TMs, showing a language that cannot be the language of any TM.
  • Establish the principle that a problem is really a language.
    • Therefore some specific problems do not have TMs.

Binary-Strings from TM’s:

  • We shall restrict ourselves to TM’s with input alphabet {0,1}.
  • Assign positive integers to the three classes of elements involved in moves:
    1. States: q1 (start state), q2 (final state), q3, etc.
    2. Symbols: X1 (0), X2 (1), X3 (blank), X4, etc.
    3. Directions: D1 (L) and D2 (R).
  • Suppose δ(qi,Xj)=(qk,Xl,Dm).
  • Represent this rule by string 0i10j10k10l10m. (0n 表示 n 个连续的 0)
    • Key point: since integers i,j, are all >0, there cannot be two consecutive 1’s in these strings.
  • Represent a TM by concatenating the codes for each of its moves, separated by 11 as punctuation.
    • That is: Code111Code211Code311
    • Note: if bianry string i cannot be parsed as a TM, assume the ith TM accepts nothing.

Diagonalization:

  • Table of Acceptance (denoted by A):
    • TM i=1,2,3, × String j=1,2,3,
    • Aij=0 means the ith TM does not accept the jth string.
    • Aij=1 otherwise.
  • Construct a 0/1 sequence D=d1d2, where di=Aii
    • D cannot be a row in A.
    • Question: Let’s suppose w=10101010. What does it mean if w appears in the ith row of the table A?
    • Answer: It means the ith TM exactly accepts the 1st,3rd,5th, binary strings.
  • Let’s give a name to this language–the diagonalization language Ld={w|w is the ith string, and the ith TM does not accept w}.
  • We have shown that Ld is not a recursively enumerable language; i.e., it has no TM.
    • There are no more TMs than integers.
    • Ld exists but we cannot always tell whether a given binary string w is in Ld.

Problems:

  • Informally, a “problem” is a yes/no (the output) question about an infinite set of possible instances (the input).
    • Example: “Does graph G have a Hamilton cycle (cycle that touches each node exactly once)?
    • Each undirected graph is an instance of the “Hamilton-cycle problem.”
  • Formally, a problem is a language.
    • Each string encodes some instance.
    • The string is in the language if and only if the answer to this instance of the problem is “yes.”
  • Example: A Problem About Turing Machines
    • We can think of the language Ld as a problem.
    • 如果 table A 的 string 是 TM 自身的 binary string 的话,这个 problem 就成了 “Does this TM not accept its own code?”

Decidable Problems:

  • A problem is decidable if there is an algorithm to answer it.
    • Recall: An “algorithm,” formally, is a TM that halts on all inputs, accepted or not.
    • Put another way, “decidable problem” = “recursive language.”
  • Otherwise, the problem is undecidable.

Bullseye Picture:

  • 最中心的集合:Decidable problems = Recursive languages
  • 外一圈:Recursively enumerable languages
    • Languages accepted by TMs with no guarantee that they will halt on inputs they never accept
    • Recursively enumerable languages Recursive languages
  • 最外层:Not recursively enumerable languages,比如 Ld
    • Languages that have no TM at all.
    • Recursively enumerable languages 的补集

Examples: Real-world Undecidable Problems

  • Can a particular line of code in a program ever be executed?
  • Is a given context-free grammar ambiguous?
  • Do two given CFG’s generate the same language?

The Universal Language:

  • An example of a recursively enumerable, but not recursive language
  • We call it Lu, of a universal Turing machine, or call it Universal TM language.
  • UTM inputs include:
    • a binary string representing some TM M, and
    • a binary string w for M
  • UTM accepts M,w if and only if M accepts w.
  • E.g. JVM
    • JVM takes a coded Java program and input for the program and executes the program on the input.

Designing the UTM:

  • Inputs are of the form: Code–for–M111w
    • Note: A valid TM code never has 111, so we can split M from w.
    • The UTM must accept its input if and only if M is a valid TM code and M accepts w.
  • UTM is a multi-tape machine:
    • Tape 1 holds the input Code–for–M111w
      • Tape 1 is never changed, i.e. never overwritten.
    • Tape 2 represents the current tape of M during the simulation of M with input w
    • Tape 3 holds the state of M.
  • Step 1: The UTM checks that M is a valid code for a TM.
    • E.g., all moves have five components, no two moves have the same state/symbol as first two components.
    • If M is not valid, its language is empty, so the UTM immediately halts without accepting.
  • Step 2: Assuming the code for M is valid, the UTM next examines M to see how many of its own tape cells it needs to represent one symbol of M.
    • How to do this: we discover the longest block of 0s representing a tape symbol and add one cell to that for a marker (e.g. #) between symbols of M’s tape.
    • Thus if say X7 is the highest numbered symbol then we’ll use 8 squares to represent one symbol of M.
      • Symbol Xi will be represented i 0s and 7i blanks followed by a marker outside.
      • For example, here’s how we would represent X5: 00000BB#.
  • Step 3: Initialize Tape 2 to represent the tape of M with input w, and initialize Tape 3 to hold the start state (always q1 so it is represented by a single 0).
  • Step 4: Simulate M.
    • Look for a move on Tape 1 that matches the state on Tape 3 and the tape symbol under the head on Tape 2.
      • If we cannot find one then apparently M halts without accepting w so UTM does so as well.
    • If found, change the symbol and move the head on Tape 2 and change the State on Tape 3.
    • If M accepts (i.e. halts), the UTM also accepts (i.e. halts).

Proof That Lu is Recursively Enumerable (RE), but not Recursive:

  • We designed a TM for Lu, so it is surely RE.
  • Proof by contradiction:
    • Suppose Lu were recursive, which means we could design a UTM U that always halted.
    • Then we could also design an algorithm for Ld, as follows.
      • Given input w, we can decide if wLd by the following steps.
        1. Check that w is a valid TM code.
          • If not, then w’s language is empty, so wLd.
        2. If valid, use the hypothetical algorithm to decide whether w111w is Lu.
        3. If so, then wLd; else wLd.
    • But we already know there is no algorithm for Ld.
    • Thus, our assumption that there was an algorithm for Lu is wrong.
    • Lu is RE, but not recursive.

这个证明需要好好解读与总结:

  • 什么是 language?
    • TM M 的 language LM={tape (不算两端的 blank 区域)|TM 能从这个 tape 的起始点一直运行到一个 final state, i.e. halt}
    • Tape 的本质是什么?是 string 呀
    • 我们说 TM M accepts a language,那么同样也可以说 TM M accepts a string,然后所有 M accept 的 string 构成了 M 的 language
  • 什么是 algorithm?
    • algorithm 是个 TM, which always halts on any input regardless of whether that input is accepted or not
      • always halts 是什么意思?你可以理解为 “always generate an output”
        • accept 输出 1
        • reject 输出 0
      • 明显,infinite loop 不可能是一个 algorithm
  • 什么是 problem?
    • problem is a language
  • TM accept a language => algorithm accepts a problem
    • 但是我们一般不这么说,因为我们常用的说法是 algorithm solves a problem
    • 反过来,a problem is solved by algorithm,也就相当于 “我们可以为 problem 这个 language 找到一个 always halt 的 TM,即找到一个 algorithm”
      • 能找到 algorithm 的 problem => Decidable Problem, i.e. Recursive language
      • 能找到 TM (但不保证是 algorithm) 的 problem => Recursively enumerable language
        • 能找到 TM,但确定不是 algorithm 的 problem => The Universal Language
      • 连 TM 都找不到的 problem => Not recursively enumerable language
  • w 的两面性
    • 如果 TM M 的 binary string 是 w,那么
      1. w 可以表示 TM M
      2. w 可以表示一个 plain string
    • 那么我们可以同时有 “w accepts w” 和 “w is accepted by w” 这两种逻辑
  • Ld 的两种解读
    • 首先它是个 language,然后 language 可以看做一个集合;然后 problem 也是 language,所以也把 Ld 抽象成一个问题描述
    • 貌似默认是把 Acceptance table A 的 string 限定为 TM 的 binary string 的话(相当于是 w×w 这样的一张表),那么:
      • Ld={w|TM w does not accept string w}
        • 你理解为 string 的集合或者 TM 的集合都行,因为 w 的两面性
        • 原来的定义是 Ld={w|w is the ith string, and the ith TM does not accept w},请自行体会一下是不是可以这么理解
      • Ld is a problem “Does this TM not accept its own code?”
  • If w is not a valid TM code:
    • 有点无赖地,我们把这样的 w 看做一种特殊的 TM
    • w’s language is empty
      • => w accepts nothing
        • => w cannot accept w
          • => wLd
  • Lu 没有 algorithm 意味着什么?
    • 给定一个 M,w,没有 algorithm 确定是否有 M accepts w
    • 还是回到 “w accepts w” 这个逻辑上,然后我们可以把 w,w 写成 w111w,所以针对这个输入,没有 algorithm 可以确定是否有 w accepts w

3. Extensions and properties of Turing machines @ Automata by Jeff UllmanPermalink

17.1 Programming TricksPermalink

Programming Trick: Multiple Tracks

  • Enable us to leave markers on the tape so TM can find their way back to an important place.
  • If there are k tracks => Treat each symbol as a vector of size k
    • E.g. k=3. (000) represents symbol 0; (BBB) represents blank; (XYZ) represents symbol [X,Y,Z]

Programming Trick: Marking

  • k1 tracks for data; 1 track for marks
  • Almost all tape cells hold blank in this mark track, but several hold special symbols (marks) that allow the TM to find particular places on the tape.
  • E.g. (BXBWYZ) represents an unmarked W, a marked Y and an unmarked Z (X is a mark)

Programming Trick: Caching in the State

  • Treat state as a vector.
  • First component is the “control state,” i.e. the orginal state
  • Other components are used as a cache to hold values the TM needs to remember.

例子待续

17.2 RestrictionsPermalink

Semi-infinite Tape: 待续

17.3 ExtensionsPermalink

Multitape TM:

  • To simulate a k-Tape TM, use 2×k tracks (2×k tracks 仍然视为一个 tape;k-Tape 是要有 k 个 head 的)
  • For every 2 tracks simulating 1 tape:
    • 1 track for tape data
    • 1 track for tape head (use mark X to indicate the head position)
  • 待续

Nondeterministic TM:

  • Allow TM to have more than 1 choice of move at each step for any <state, symbol> pair.
    • Each choice is a <state,symbol,direction> triple, as for the deterministic TM.
  • Simulating a NTM by a DTM: 待续

How a TM can simulate a store (a storage system that allow us to associate any value with any name):

  • Very large stores => a significant factor in big-data world, like google's big table.
  • 待续

17.4 Closure Properties of Recursive and RE LanguagesPermalink

待续

4. Specific undecidable problemsPermalink

Rice’s theorem: tells us that almost every question we can ask about the recursively enumerable languae is undecidable.

Properties of Languages:

  • Any set of languages is a property of languages.
    • Example: The infiniteness property is the set of infinite languages.
    • If a language L “has property infiniteness”, it means “L property infiniteness
    • In what follows, we’ll focus on properties of RE languages, because we can’t represent other languages by TM’s.
  • We shall think of a property as a problem about Turing machines.
    • 我们可以认为 property P={L|L has property P}。从这个角度看,property 是一个关于 languages 的 language;进一步,property 是一个 problem。
      • 这个 problem 可以描述为:Given a language, does it have property P?
      • 那尽然 property 是一个 problem,我们就可以说 “property is decidable/undecidable”
    • 又因为 L 可以由 TM 产生,所以我们进一步定义:For any property P, we can define a language LP, the set of binary strings for TM M such that L(M) has property P.
      • 简单说就是 L(M)PMLP
      • 我们可以把 LP 视为这样的一个 problem:Given a code for a TM, does it define a language with that property P?
  • There are two (trivial) properties P for which LP is decidable.
    1. The always-false property, which contains no RE languages.
      • E.g. “this language is not RE.”
      • How do we decide this property? I.e. the algorithm for this property is: Given an input w, ignore it and say no (reject).
      • Empty language is a RE language.
    2. The always-true property, which contains every RE language.
      • E.g. “this language is RE.”
      • How do we decide this property? I.e. the algorithm for this property is: Given an input w, ignore it and say yes (accept).
  • Rice’s Theorem: For every other (i.e. non-trivial) property P, LP is undecidable.

Reductions:

  • A reduction from language L to language L is an algorithm (TM that always halts) that takes a string w and converts it to a string x, with the property that: x is in L if and only if w is in L.
  • The value of having such a reduction is that it tells us L is no harder than L, at least as far as decidability is concerned.
  • TM’s as Transducers
    • We have regarded TM’s as acceptors of strings.
    • But we could just as well visualize TM’s as having an output tape, where a string is written prior to the TM halting.
  • If we reduce L to L, and L is decidable, then the algorithm for L + the algorithm of the reduction shows that L is also decidable.
    • Normally used in the contrapositive (逆否命题).
    • If we know L is not decidable, then L cannot be decidable.
  • More powerful forms of reduction: if there is an algorithm for L, then there is an algorithm for L.
    • E.g. we reduced Ld to Lu
  • More in NP-completeness discussion on Karp vs. Cook reductions
    • The simple reduction is called a Karp reduction.
    • More general kinds of reductions are called Cook reductions.

Proof of Rice’s Theorem

  • Proof Skeleton
    • We shall show that for every nontrivial property P of the RE languages, LP is undecidable.
    • We show how to reduce Lu to LP.
    • Since we know Lu is undecidable, it follows that LP is also undecidable.
  • The Reduction
    • The input to Lu, is a TM M and an input w for M. Then our reduction algorithm produces the code for a TM M.
    • Define property P=M accepts w.
      • Thus L(M) has property P if and only if M accepts w.
        • L(M) has property P” 也就意味着 “M accepts a language with property P
      • That is MLP if and only if M,wLu.
    • M has two tapes, used for:
      1. Simulates another TM ML on M’s own input, say x
        • The transducer (which in fact is M) does not deal with or see x
      2. Simulates M on w.
        • Note: neither M, ML, nor w is input to M.
    • Assume that the empty language does not have property P.
      • If it does, consider the complement of P, say Q. then has property Q.
      • If we could prove that Q are undecidable, then P must be undecidable. That is if LP were a recursive language, then so would be LQ since the recursive languages are closed under complementation.
    • Let L be any language with property P, and let ML be a TM that accepts L.
  • Design of M
    1. On the second tape, M writes w and then simulate M on w.
    2. If M accepts w, then simulate ML on the input x on the first tape.
    3. M accepts its input x if and only if ML accepts x, i.e. xL.
      • If M does not accept w, M never gets to the stage where it simulate ML, and therefore M accept nothing.
      • In this case, M defines an empty language, which does not have property P.
    4. Suppose M accepts w.
      • Then M simulates ML and therefore accepts x if and only if x is in L.
      • That is, L(M)=L, L(M) has property P, and MLP.
    5. Suppose M does not accept w.
      • Then M never starts the simulation of ML, and never accepts its input x.
      • Thus, L(M)=, and L(M) does not have property P.
      • That is, MLP.
    6. Thus, the algorithm that converts M and w to M is a reduction of Lu to LP.
      • Thus, LP is undecidable.


5. Turing Machines @ Erickson §6Permalink

6.1 Why Bother?Permalink

Admittedly, Turing machines are a terrible model for thinking about fast computation; simple operations that take constant time in the standard random-access model can require arbitrarily many steps on a Turing machine. Worse, seemingly minor variations in the precise definition of “Turing machine” can have significant impact on problem complexity. As a simple example (which will make more sense later), we can reverse a string of n bits in O(n) time using a two-tape Turing machine, but the same task provably requires Ω(n2) time on a single-tape machine.

But here we are not interested in finding fast algorithms, or indeed in finding algorithms at all, but rather in proving that some problems cannot be solved by any computational means. Such a bold claim requires a formal definition of “computation” that is simple enough to support formal argument, but still powerful enough to describe arbitrary algorithms. Turing machines are ideal for this purpose. In particular, Turing machines are powerful enough to simulate other Turing machines, while still simple enough to let us build up this self-simulation from scratch, unlike more complex but efficient models like the standard random-access machine.

(Arguably, self-simulation is even simpler in Church’s λ-calculus, or in Schönfinkel and Curry’s combinator calculus, which is one of many reasons those models are more common in the design and analysis of programming languages than Turing machines. Those models much more abstract; in particular, they are harder to show equivalent to standard iterative models of computation.)

6.2 Formal DefinitionsPermalink

  • Three distinct special states {start,accept,rejectQ}.
  • A transition function δ:(Q{start,reject})×ΓQ×Γ×{1,+1}.

7. Turing Machines @ Computational Complexity by Mike RosulekPermalink

Languages and UTM (Lecture 1 & 2)Permalink

Language:

  • A language is a subset of {0,1} (actually it could be any subset of {0,1})
  • A language L is a set of “yes-instances”
    • x is a “yes-instance” if xL
  • Language S is Turing-recognizable (“recursively enumerable / r.e.”) if TM M, such that x
    • xSM accepts x
    • xSM rejects x or runs forever
  • Language S is Turing-decidable (“recursive”) if TM M, such that x
    • xSM accepts x
    • xSM rejects x
  • An algorithm formally is a TM which
    1. halts on any input regardless of whether that input is accepted or not, and
    2. accepts (a language) by final state
  • In other words, language S is Turing-decidable (“recursive”) if an algorithm for S.
  • If M is a TM, define L(M)={x|M accepts x}

Programming conventions:

  • Programming a TM for real is cruel!
  • Describe a TM “program” in terms of tape modifications & head movements
  • “Mark” cells on the tape (e.g., aa´)

E.g. TM algorithm for Palindromes={x|x=reverse(x)}:

  1. “Mark” first char (e.g., O), remember that char in internal state
  2. If char to the right is “marked” or blank, then accept; else scan right until blank or a “marked” char
  3. If prev charremembered char, reject; else mark it
  4. Scan left to leftmost unmarked char; if no more unmarked chars, accept; else repeat from step #1

Universal Machines:

  • TMs can be encoded as strings (“code is data”)
    • Convention: every string encodes some TM
    • M: encoding of a TM M
    • M,x: encoding of a TM M and a string x
  • Lacc={M,x|M is a TM that accepts x} is Turing-recognizable (RE)
    • Suppose there such a TM U that accept M,x, i.e. U(M,x)=yes.
    • U accepts M,xM,xLacc.
    • In other words, U recognizes Lacc, i.e. L(U)=Lacc
  • TM U can recognizes Lacc, which means it can simulate any TM M! Thus call it a “universal TM”

Design of Universal TM:

  • On input M,x, use 3 tapes:
    • one for description of M
    • one for M’s work tape contents
    • one for M’s current state
  • Transitions: state, char, newstate, newchar, direction
  • Legal to say “simulate execution of M on input x” in our TM pseudocode! – If M halts on x, the simulation will also halt – “Simulate execution of M on input x for t steps” also possible (always halts)
    • Can simulate t steps of a TM in O(tlogt) steps

Diagonalization and Reduction (Lecture 2 & 3)Permalink

Diagonalization:

  • Lacc={M,x|M is a TM that accepts x} is not Turing-decidable (“recursive”)
    • Likewise, Lhalt={M,x|M is a TM that halts x}
  • 这里的 Lacc 即前面 Jeff Ullman 的 Lu

(Turing) Reductions:

  • Motivation:
    • Let’s not use diagonalization technique every time to prove something undecidable!
    • Instead, “reduce” one problem to another.
  • What does “Alg decides lang” mean?
    • Language L={x|x is/satisfies blab blah blah} is a collection of yes-instances.
      • An x that is/satisfies “blab blah blah” is a yes-instance.
    • An algorithm is in nature a TM, so we can say “an algorithm M”.
    • If we say “M decides L”, it means:
      • L(M)=L
      • Imagine M as a function,
        • The datatype of the parameter to M is the the one of the elements of L. E.g.
          • if L={x|}, elements are x’s. Therefore the function signature is M(x);
          • if L={x,y|}, elements are x,y’s. Therefore the function signature is M(x,y).
        • M(x)=yes if x is/satisfies “blab blah blah”
        • M(x)=no otherwise
    • If we say “go find an algorithm for language”, it means:
      • go find an M such that L(M)=L
      • go find an M such that M(x)=yes if x is/satisfies “blab blah blah”
  • Definition:
    • We say L1TL2 (“L1 Turing-reduces to L2”) if there is an algorithm that decides L1 using a (hypothetical) algorithm that decides L2.
  • Inference:
    • CASE 1: implication
      • Ground Truth: L2 is decidable
      • Goal: To prove L1 is also decidable
      • Method: Construct an algorithm M1 for L1 which satisfies L1TL2
        • L2 is decidable so there must exist an algorithm M2 for L2
        • Call M2 inside M1
        • In this way we prove that L1TL2 holds.
    • CASE 2: contrapositive
      • Ground Truth: L1 is undecidable
      • Goal: To prove L2 is also undecidable
      • Method: Assume L2 is decidable. Under this assumption, show we can construct an algorithm M1 for L1 which satisfies L1TL2, i.e. L1TL2 holds.
        • In this way we imply that L1 is decidable.
        • However, we already know that L1 is undecidable.
        • Therefore the assumption is invalid.
  • E.g. show that Lempty={M|M is a TM and L(M)=} is undecidable.
    • Choose Lacc={M,x|M is a TM that accepts x} as L1
    • Lempty is L2.
      • Assume there exists an algorithm Mempty.
    • Construct an algorithm Macc using Mempty:
      • Signature: Macc(M,x)
      • For every single pair of input Mi,xj:
        • Construct an TM Mij(z)={ignore z;return Mi(xj)}
          • 根据 Lacc 的语义,Mi 要么 accept xj,要么 reject
          • CASE 1: If Mi accepts xj, Mi(xj)=yes.
            • Therefore Mij(z)={ignore z;return yes}.
            • I.e. Mij accept every z.
            • I.e. Mij accept everything.
            • I.e. L(Mij)=Ω (全集)
            • I.e. MijLempty.
            • I.e. Mempty(Mij)=no.
          • CASE 2: If Mi rejects xj, Mi(xj)=no.
            • Therefore Mij(z)={ignore z;return no}.
            • I.e. Mij rejects every z.
            • I.e. Mij accept nothing.
            • I.e. L(Mij)=
            • I.e. MijLempty.
            • I.e. Mempty(Mij)=yes.
        • If Mempty(Mij)=yes
          • I.e. MijLempty
          • 一路反推到 CASE 2,我们有 Mi rejects xj
          • 此时我们的 Macc(Mi,xj) 要 return no
        • If Mempty(Mij)=no
          • I.e. MijLempty
          • 一路反推到 CASE 1,我们有 Mi accepts xj
          • 此时我们的 Macc(Mi,xj) 要 return yes
    • Now we showed LaccTLempty. We assumed Lempty is decidable, so Lacc is also decidable, which is against the truth. Therefore the assumption is invalid and Lempty is undecidable.
    • 最难的地方在 “Construct an algorithm Macc using Mempty” 这一步,请结合 MaccMempty 综合考虑。一般的的套路是:
      • 根据 L1 的输入做一个临时变量,然后把这个临时变量当做 L2 的输入。
        • 这个临时变量要满足 M2 的 signature
        • 如果临时变量是一个 TM,比如上面的 Mij,在处理自身的输入 z 时要见机行事
          • 如果能直接 return Mi(xj) 或者 return !Mi(xj) 无疑是最好的
          • 如果不能,考虑 if Mi(xj)=yes,return function(z) 这样的形式
      • 然后根据 L2 的输出决定 L1 的输出。
// Reduce L_acc to L_empty
M_acc(<M_i,x_j>) {
	M*_ij = TurningMachine(z) {
		return M_i(x_j)
	}
	
	return !M_empty(M*_ij)
}

// General form of reduction algorithm for L_acc
M_acc(<M_i,x_j>) {
	M*_ij = TurningMachine(z) {
		if(M_i(x_j) = yes) {
			return f_1(z)
		} else {
			return f_2(z)
		}
	}
	
	return f_3(M_2(M*_ij))
}

Rice’s Theorem (Lecture 4)Permalink

Rice’s Theorem. {M|L(M) has property P} is undecidable if P is non-trivial.

  • Non-trivial means:
    • MY that L(MY) has property P;
    • also MN that L(MN) does not has property P.
    • I.e. “是否有 property P” 这个问题并不是永真或者永假

In other words, given encoding of M:

  • Can’t decide whether L(M) is empty
  • Can’t decide whether some special xL(M)
  • Can’t decide whether L(M) is finite
  • etc.

Define LP={M|L(M) has property P}.

Claim. LaccTLP.

Proof.

  • By “non-trivial”:
    • Suppose does not have property P.
    • MY that L(MY) has property P.
  • For every single pair of input Mi,xj, construct an TM Mij(z), such that on input z
    • If Mi(xj)=yes
      • return MY(z)
    • else return No
  • If MijLP, Macc(Mi,xj) return Yes; else No
  • 逻辑是:
    • If Mi accepts xj, Mij 等价于 MY,此时 Mij 应该具有 property P.
    • If Mi rejects xj, Mij 等价于 ,此时 Mij 应该不具有 property P

Kolmogorov Complexity (or, “optimal compression is hard!”) (Lecture 5 & 6) @ Algorithmic Information Theory and Kolmogorov ComplexityPermalink

Problem:

  • 假设 compress x 得到 y,decompress y 得到 x
  • 假设有一个 decompression algorithm U that U(y)=x
  • KU(x)=min{|y||U(y)=x}
    • How small x can be compressed?
    • Here |y| denotes the length of a binary string y
  • In other words, the complexity of x is defined as the length of the shortest description of x if each binary string y is considered as a description of U(y)

Optimal decompression algorithm:

  • The definition of KU depends on U.
  • For the trivial decompression algorithm U(y)=y we have KU(x)=|x|.
  • One can try to find better decompression algorithms, where “better” means “giving smaller complexities”

Definition 1. An algorithm U is asymptotically not worse than an algorithm V if KU(x)KV(x)+C for some constant c and for all x.

Theorem 1. There exists an decompression algorithm U which is asymptotically not worse than any other algorithm V.

Such an algorithm is described as asymptotically optimal.

  • The complexity KU with respect to an asymptotically optimal U is called Kolmogorov complexity.
  • Assume that some asymptotically optimal decompression algorithm U is fixed, the Kolmogorov complexity of a string x is denoted by K(x) (=KU(x)).
    • The complexity K(x) can be interpreted as the amount of information in x or the “compressed size” of x.

The construction of optimal decompression algorithm:

  • The idea of the construction is used in the so-called “self-extracting archives”. Assume that we want to send a compressed version of some file to our friend, but we are not sure he has the decompression program. What to do? Of course, we can send the program together with the compressed file. Or we can append the compressed file to the end of the program and get an executable file which will be applied to its own contents during the execution.
  • 待补充。

Basic properties of Kolmogorov complexity:

  • 待补充。

Algorithmic properties of K

  • 结合 Berry Paradox 补充

Complexity and incompleteness:

  • 不懂

Algorithmic properties of K (continued):

  • 不懂

An encodings-free definition of complexity:

  • 不懂

Axioms of complexity:

  • 不懂

Kolmogorov Complexity (or, “optimal compression is hard!”) (Lecture 5 & 6) @ WikipediaPermalink

DefinitionPermalink

If P is a program which outputs a string x, then P is a description of x. The length of the description is just the length of P as a character string, multiplied by the number of bits in a character (e.g. 7 for ASCII).

We could, alternatively, choose an encoding for Turing machines M. If M is a Turing Machine which, on input w, outputs string x, then the concatenated string Mw is a description of x.

Any string s has at least one description, namely the program:

function GenerateFixedString()
    return s

If a description of s, d(s), is of minimal length (i.e. it uses the fewest bits), it is called a minimal description of s. Thus, the length of d(s) (i.e. the number of bits in the description) is the Kolmogorov complexity of s, written K(s). Symbolically, K(s)=|d(s)|.

Invariance theoremPermalink

Informal treatmentPermalink

Theorem. Given any description language L, the optimal description language is at least as efficient as L, with some constant overhead.

Proof: Any description D in L can be converted into a description in the optimal language by first describing L as a computer program P (part 1), and then using the original description D as input to that program (part 2). The total length of this new description D is (approximately): |D|=|P|+|D|.

The length of P is a constant that doesn’t depend on D. So, there is at most a constant overhead, regardless of the object described. Therefore, the optimal language is universal up to this additive constant.

A more formal treatmentPermalink

Theorem. If K1 and K2 are the complexity functions relative to Turing complete description languages L1 and L2, then there is a constant c – which depends only on the languages L1 and L2 chosen – such that

s,cK1(s)K2(s)c.

Proof: By symmetry, it suffices to prove that there is some constant c such that for all strings s, K1(s)K2(s)+c.

Now, suppose there is a program in the language L1 which acts as an interpreter for L2:

function InterpretL2(string p)
    return p()

Running InterpretL2 on input p returns the result of running p.

Thus, if P is a program in L2 which is a minimal description of s, then InterpretL2(P) returns the string s. The length of this description of s is the sum of

  • The length of the program InterpretL2, which we can take to be the constant c.
  • The length of P which by definition =K2(s).

This proves the desired upper bound.

Basic resultsPermalink

In the following discussion, let K(s) be the complexity of the string s.

Theorem. There is a constant c such that s,K(s)|s|+c.

Theorem. There exist strings of arbitrarily large Kolmogorov complexity. Formally: for each nN, there is a string s with K(s)n.

Proof: Otherwise all of the infinitely many possible finite strings could be generated by the finitely many programs with a complexity below n bits.

Theorem. K is not a computable function. In other words, there is no program which takes a string s as input and produces the integer K(s) as output.

Chain rule for Kolmogorov complexity:

K(X,Y)=K(X)+K(Y|X)+O(log(K(X,Y)))

It states that the shortest program that reproduces x and Y is no more than a logarithmic term larger than a program to reproduce x and a program to reproduce Y given X. Using this statement, one can define an analogue of mutual information for Kolmogorov complexity.

CompressionPermalink

A string s is compressible by a number c if it has a description whose length does not exceed |s|c bits. This is equivalent to saying that K(s)|s|c. Otherwise, s is incompressible by c.

A string incompressible by 1 is said to be simply incompressible––by the pigeonhole principle, which applies because every compressed string maps to only one uncompressed string, incompressible strings must exist, since there are 2n bit strings of length n, but only 2n1 shorter strings.

There are 2n bitstrings of length n. The number of descriptions of length not exceeding nc is given by the geometric series:

1+2+22+...+2nc=2nc+11

There remain at least 2n2nc+1+1 bitstrings of length n that are incompressible by c. To determine the probability, divide by 2n.

Chaitin’s incompleteness theoremPermalink

We know that, in the set of all possible strings, most strings are complex in the sense that they cannot be described in any significantly “compressed” way. However, it turns out that the fact that a specific string is complex cannot be formally proven, if the complexity of the string is above a certain threshold.

Theorem. There exists a constant L (which only depends on the particular axiomatic system and the choice of description language) such that there does not exist a string s for which the statement

K(s)L(as formalized in S)

can be proven within the axiomatic system s.

Proof by contradiction using Berry’s paradox. 略

Time/space complexity classes: P & NP @ class (Lecture 7)Permalink

Resource bounds:

  • M has (worst case) running time T if string x, M halts after at most T(|x|) steps.
  • M has (worst case) space usage s if string x, M writes on at most S(|x|) tape cells.

Basic Complexity Class:

  • DTIME(f(n))={L|L decided by a (deterministic) TM with running time O(f(n))}
    • D for “deterministic”
    • Formally, LTIME(f(n)) if there is a TM M and a constant c such that
      1. M decides L, and
      2. M runs in time cf;
        • i.e., for all x (of length at least 1), M(x) halts in at most cf(|x|) steps.
    • E.g. DTIME(n2)=set of all problems that can be solved in quadratic time
  • DSPACE(f(n))={L|L decided by a TM that uses spaces O(f(n))}
  • Language / decision problem = set of strings (yes-instances)
  • Complexity class = set of languages

Standard Complexity Classes:

  • P={problems that can be solved (by a deterministical TM) in poly time}
    • P=k=1,2,DTIME(nk)
      • 如果把 P 看做 property 的话,P 可以简单描述为 “deterministic poly time”
      • “deterministic” = “deterministically solvable in time” = “solvable in time by a deterministical TM”
  • PSPACE={problems that can be solved using poly space} PSPACE=k=1,2,DSPACE(nk)
  • EXP={problems that can be solved by in exponential time}
    • EXP=k=1,2,DTIME(2nk)
  • L={problems that can be solved using log space}
    • L=DSPACE(logn)

Translating from “Complexity Theory Speak”:

  • Is XPSPACE?
    • Can problem x be solved using polynomial space?
  • Is PSPACEP?
    • Can every problem solvable in polynomial space also be solved in polynomial time?
  • This is true: PPSPACE

Relationships between Complexity Classes:

  • f(n),DTIME(f(n))DSPACE(f(n))
    • This follows from the observation that a TM cannot write on more than a constant number of cells per move.
  • If f(n)=O(g(n)), then DTIME(f(n))DTIME(g(n))

Complementation @ Complement Classes and the Polynomial Time Hierarchy:

  • 这里先声明下,全集可以表示为 Ω 或者 {0,1}
  • The complement of a decision problem L, denoted coL, is the set of “No”-instances of L.
    • 一般来说,我们可以认为 coL=L=ΩL
    • 严格来说,LcoL=WFLΩ where WFL is the set of well-formed strings describing “Yes” and “No” instances. That is, coL=WFLL.
      • 只是通常会限定 WFL=Ω
    • 举个例子:Every positive integer x>1 is either composite (合数) or prime (质数)
      • 如果限定 Ω=WFL={x|x is a positive integer and x>1}
        • L={x|x is prime }
        • coL=L={x|x is not prime }={x|x is composite }
      • 如果仅限定 Ω=WFL={x|x is a positive integer }
        • coL=L={x|x is not prime }{x|x is composite }
        • 因为 1 既不是 prime 也不是 composite
  • The complement of a complexity class is the set of complements of languages in the class.
    • C={L|L has complexity C}
    • coC={coL|LC}
    • 注意 coCC 没有半毛钱的关系
      • C={L|L does NOT have complexity C}
      • coC={L|L‘s complement has complexity C}
        • L‘s complement has complexity C 并不能说明 L has C or not
  • Theorem 1. C1C2coC1coC2
  • Theorem 2. C1=C2coC1=coC2
  • Closure under complementation means: C=coC
    • We say such class C are “closed under complementation”.
  • Theorem 3. If C is a deterministic time or space complexity class, then C=coC.
    • E.g.
      • coTIME(f(n))=TIME(f(n))
      • coP=P
        • 翻译一下:如果 L can be solved by in poly time coL can also be solved by in poly time
        • vise versa
      • coPSPACE=PSPACE
    • Why? For any class C defined by a deterministic TM M, just switch accpet/reject behavior and you get a TM coM that decide coC
      • i.e. M(L)=yescoM(coL)=yes, in the same bound of time or space.

NP:

  • NP 可以描述为 nondeterministic polynomial time
    • “nondeterministic” = “nondeterministically solvable in time” = “solvable in time by a nondeterministical TM”
  • Definition 1. A problem is assigned to the NP class if it is solvable in polynomial time by a nondeterministic TM.
    • NP={problems that can be solved by a nondeterministic TM in poly time}
  • Definition 2. NP=set of decision problems L, where
    • L={x|w:M(x,w)=1}
    • M halts in polynomial time, as a function of |x| alone.
    • x: instance
    • w: proof / witness / solution
    • 简单理解就是,我们没有办法直接确定是否有 xL (nondeterministic),只能通过 (x,w) 是否满足 L 的条件来判断这个 x 是否有 L
  • 举例待补充

coNP

  • Definition. coNP=set of decision problems L, where
    • L={x|w:M(x,w)=1}
    • 等价于 L={x|w:M(x,w)=0}
    • 等价于 L={x|w:M(x,w)=1}
    • M halts in polynomial time, as a function of |x| alone.
  • 举例待补充

P vs NP vs coNP

Theorem 1. PNP

Proof: Take any LP, then a PTM M such that L={x|M(x)=1}.

Define M(x,w)={ignore w;return M(x)}. Therefore L={x|w:M,(x,w)=1}NP

Theorem 2. PcoNP

Proof: Ditto.

Closure Properties of NP, coNP

  • If A,BNP, then
    • ABNP
    • ABNP
  • If A,BcoNP, then
    • ABcoNP
    • ABcoNP

Karp reductions, NP hardness/completeness (Lecture 8)Permalink

Karp Reduction: Define APB (“A Karp-reduces to B”) to mean:

 poly time function f:x:xAf(x)B

P is transive: APBPCAPC.

L is NP-hard if ANP:APL, which means:

  • L is at least as hard as EVERY problem in NP
  • If you could solve L in poly-time then P=NP

L is NP-compelte if:

  • L is NP-hard
  • LNP

which means:

  • L is (one of) the hardest problem in NP
  • It’s impossible to prove P=NP by solving L in poly-time, because LNP

Theorem. If APB and A is NP-hard, then B is NP-hard.

Proof: LNP:LPAPBLPB

  • If APB and BP AP
  • If APB and BNP ANP
  • Suppose A is NP-compelte; then APP=NP

“Canonical” NP-complete Problem X={M,x,T|w:M accepts (x,w) in |T| steps}.

Claim. XNP

Proof: Can be checked in poly time as a function of length of M,x,T (by universal TM).

Claim. ANP,APX

Proof: ANP, so A={x|w:MA(x,w)=1} where the running time of MA is p(|x|).

Define 1p(|x|) a string of p(|x|) ones. Consider f(x)=MA,x,1p(|x|).

xA:f(x)X. Therefore APX.

Cook-Levin Theorem & Natural NP-complete problems (Lecture 9 & 10)Permalink

Cook-Levin Theorem. SAT={φ|φ is a satisfiable boolean formula} is NP-complete

How to SATP3SAT:

  • For a long clause in SAT, say x1x3x4x2x5,
  • Introduce s and break it into (x1x3s)(sx4x2x5).
    • If s is TRUE, then x4x2x5 must be TRUE
    • If s is FALSE, then x1x3 must be TRUE
    • which means, one of x1x3 and x4x2x5 must be true
  • Further break into: (x1x3s)(sx4t)(tx2x5)

NP in terms of nondeterministic computation (Lecture 11)Permalink

NP={problems that can be solved by a nondeterministic TM in poly time}.

Nondeterministic TM:

  • In state q, reading char c, transition δ(q,c) is a set of possible / legal actions.
    • Thus we can draw a graph of configurations for a Nondet TM on a given input x.
    • 注意 configuration graph 不是 general 的,不是针对所有的 input x 的;而是对每一个特定的 x,我们都可以画一个 configuration graph
    • configuration graph 也可以称为 computation tree
  • Nondeterministic TM is just imaginary, not a realistic model for computation. Just convenient for computational theory.

Nondeterministic TM acceptance:

  • 在 configuration tree 中,每一个 internal node(非 leaf) 都是一个 configuration
    • A configuration can have many outgoing edges in the configuration graph.
    • 我们称每个可以走的 outgoing edge 为一个 choice
  • Define computation thread on x: a path start at (qstart,x,)
    • 所以每个 computation thread 都可以看做一个 sequence of choices
  • Define accepting thread: a computation thread eventually reaches (qaccept,,)
  • NTM accepts x an accepting thread on x
  • NTM rejects x an accepting thread on x all threads rejects x
  • Running time of NTM on x: max # of steps among all threads
  • Space usage of NTM on x: max tape usage among all threads

Pitfalls of NTM:

  • Nondeterministic “algorithm” defined in terms of local behavior within each thread
  • Acceptance/rejection defined in terms of global/collective behavior of all threads
  • Threads can’t “communicate” with each other

NTIME, NSPACE & NP

  • NTIME(f(n))={L|L accepted by a NTM with running time O(f(n))}
  • NSPACE(f(n))={L|L accepted by a NTM that uses spaces O(f(n))}
  • NP=k=1,2,NTIME(nk)

Claim: for NP, the witness-checking definition and the NTM definition are equivalent.

Proof:

Hierarchy theorems (Lecture 12 & 13)Permalink

Complexity Separations–the holy grail of computational complexity:

  • Containments: PNP,NPPSPACE,DTIME(n)DSPACE(n), etc
  • Separations: PNP?,PPSPACE?,NPcoNP?

Deterministic Hierarchy:

  • DTIME(n)DTIME(n2)DTIME(n3)
  • DSPACE(n)DSPACE(n2)DSPACE(n3)

How to prove DTIME(n)DTIME(n2)?

  • Better Diagonalization

Deterministic Time Hierarchy Theorem:

  • f(n)logf(n)=o(g(n))DTIME(f(n))DTIME(g(n))
  • Thus, PEXP

Deterministic Space Hierarchy Theorem:

  • Ditto
  • Thus, LPSPACE

Non-deterministic Hierarchy:

  • NTIME(n)NTIME(n2)NTIME(n3)
  • NSPACE(n)NSPACE(n2)NSPACE(n3)

How to prove NTIME(n)NTIME(n2)?

  • Delayed/lazy Diagonalization

Non-deterministic Time Hierarchy Theorem:

  • f(n+1)=o(g(n))NTIME(f(n))NTIME(g(n))
  • Thus, NPNEXP

Non-deterministic Space Hierarchy Theorem:

  • Ditto
  • Thus, NLNPSPACE

Ladner’s theorem & NP-intermediate problems (Lecture 14)Permalink

Questions:

  • Most natural problems are in P or NP-complete.
  • Any problem between P and NP-complete?

Ladner’s theorem:

  • If PNP then we can construct M such that:
    • L(M)NP
    • L(M)P
    • L(M)NP-complete

Claim: all finite languages are in P.

Proof: A finite language is a finite list of all yes-instances. Build an algorithm to compare input to this list and running time is bounded.

Claim: if L=3SATA where A is finite, then L is NP-complete.

Proof: 注意这里 3SATA 并不是表示 “infinite 3SAT”,只是简单的表示 “3SAT 抠掉了一个 finite 集合”,所以也称为 “finite modification of 3SAT” (这里 modification 就是 exclusion 的意思)。

3SATPL:

  • If xA (比方说 A 只有 x1,x2,x3 三个 literal 并只有两个 clause), convert x to an instance in L
  • If xA, xL by definition.

Ladner’s theorem 证明略

Polynomial hierarchy (Lecture 15)Permalink

Question:

  • NP problem: {x|w:M(x,w)=1}
  • coNP problem: {x|w:M(x,w)=1}
    • M runs in poly(|x|) time
  • What about problems like {x|w1w2:M(x,w1,w2)=1}?

Polynomial Hierarchy:

  • Define Σk problem: {x|w1w2wk:M(x,w1,w2,,wk)=1}
  • Define Πk problem: {x|w1w2wk:M(x,w1,w2,,wk)=1}
    • Generally, 这里的 M 应该是一个 Nondet TM
    • M runs in poly(|x|) time
  • E.g.
    • Σ0=P
    • Σ1=NP
    • Π0=coP=P
    • Π1=coNP
    • Πk=coΣk
  • # of quantifiers doesn’t matter. E.g. L={x|w1w2w3w4w5:M(x,w1,w2,w3,w4,w5)=1}
    • Obviously, LΠ5
    • But also L={x|(w1,w2)(w3,w4,w5):M(x,w)=1}, so LΠ2

Properties (for all k):

  • ΣkΣk+1Πk+1
  • ΣkPSPACE
  • Σk=coΠk
  • ΠkΣk+1Πk+1
  • ΠkPSPACE
  • Πk=coΣk

Proof of ΣkΣk+1Πk+1:

Suppose L={x|w1w2wk:M(x,w)=1}Σk.

Then L={x|w1w2wkwjunk:M(x,w)=1}Σk+1 and L={x|wjunkw1w2wk:M(x,w)=1}Πk+1

Polynomial Hierarchy:

  • Define PH=k=0,1,Σk=k=0,1,Πk
  • Conjecture: for all k>0: ΣkΠk.
  • Since Σk,ΠkPSPACE for all k>0, we have PHPSPACE. Each Σk or Πk is called a level in the hierarchy.

Theorem: If Σk=Πk for some k, then PH=Σk (denoted as “PH collapses to k”).

Proof: (1) If Σk=Πk, then Σk=Σk+1.

Take any LΣk+1.

L={x|w1w2wk+1:M(x,w)=1}. M runs in poly time.

Using the same M, define L={(x,w1)|w2w3wk+1:M(x,w)=1}. Therefore LΠk.

Since Σk=Πk, L can be rewritten as L={(x,w1)|w2w3wk+1:M(x,w1w)=1}.

Reconstruct L using L:

L={x|w1w2w3wk+1:M(x,w1,w)=1}={x|(w1,w2)w3wk+1:M(x,w1,w)=1}Σk

(2) If Σk=Πk=Σk+1, then Σk+1=Σk=coΠk=coΣk+1=Πk+1

Oracle computations and a characterization of PH (Lecture 16 & 17)Permalink

Let A be some decision problem and M be a class of Turing machines. Then MA is defined to be the class of machines obtained from M by allowing instances of A to be solved in one step.

  • We call the “one-step” (not necessarily O(1), but a guaranteed time bound) algorithm to A an oracle.
  • E.g. we can declare an oracle to solve 3SAT in poly time.

Similarly, if M is a class of Turing machines and C is a complexity class, then MC=ACMA. If L is a complete problem for C, and the machines in M are powerful enough to compute polynomial-time computations, then MC=ML.

Oracle complexity classes:

  • Let A be a decision problem, then
    • PA={decision problems solvable by a poly-time A-OTM}
      • PA={L(MA)|M is a poly-time OTM}
    • NPA={decision problems solvable by a non-det poly-time A-OTM}
  • Let C be a complexity class, then
    • PC=LCPL
      • PC={problems solvable by a poly-time OTM with some problem from C as its oracle}
    • NPC=LCNPL
      • Differenet threads of an NP computation can ask different oracle queries for LC

Note: Overloaded Notation!

  • ML: special TM M with oracle for a specific decision problem L
  • NPL: complexity class NP with oracle for a specific decision problem L
  • NPC: complexity class NP with oracle for any decision problem C

Claim: If L is a C-complete problem, then PL=PC,NPL=NPC etc.

Proof: By Karp reduction.

注意,我们的确是有 PC=LCPL,但是我们不要求一定要 LC-complete 的。

  • PP=P
  • NPP=NP
  • PNP=PcoNP
  • coNPPNP

PH new characterization:

  • Σ0=Π0=P
  • Σk=NPΣk1=NPΠk1
    • Σ0=P
    • Σ1=NPP=NP
    • Σ2=NPNP
    • Σ3=PNPNP
  • Πk=coΣk

Claim: Σk=NPΠk1

Proof: By induction on k.

(1) [ΣkNPΠk1]

Take any LΣk, so L={x|w1w2wk:M(x,w)=1}. M runs in poly time.

Goal: show xL x is accepted by a NPTM M with Πk1 oracle, i.e. xL(M).

Using the same M, define L={(x,w1)|w2w3wk:M(x,w)=1}. Therefore LΠk1.

Choose L as my oracle. Define M(x):

  1. Non-deterministically guess w1
    • 不同的 guess of w1 对应不同的一条 computation thread
  2. Call oracle to see (x,w1)?L.
  3. Accept x if oracle says yes.
    • 相当于我们穷举了所有 w1 的取值,并 query (x,w1)?L
    • 如果有某个 query 返回了 YES,说明 an accepting thread NTM accepts x
    • 如果所有的 query 都返回 NO,说明 an accepting thread NTM rejects x
M nondet accepts xaccepting thread on xw1:(x,w1)Lw1w2wk:M(x,w)=1xL

(2) [ΣkNPΠk1]

Suppose there is a NPTM M using L={x|w2w3wk:M(x,w)=1}Πk1 as its oracle.

Goal: show xL(M) w1:(x,w1)L for some Πk1 set L.

For simplicity, assume that each thread of M makes only 1 query to the oracle. Define an existentially quantified variable y1 that include all computation threads of M.

Suppose on input x, an accepting thread of M, say y, that makes a query q?L.

CASE 1: on accepting thread y, oracle answer is YES.

xL(M) y1:(x,y1)L (i.e. 我们的 L 直接取 L 就好了).

CASE 2: on accepting thread y, oracle answer is NO.

xL(M) y1:(x,y1)coL

Because coLΣk1, {x|y1:(x,y1)coL}Σk1 (合并 y1w2). Based on the fact that Σk1Σk, we can still conclude that {x|y1:(x,y1)coL}Σk.

Space complexity (in terms of configuration graphs) (Lecture 18)Permalink

  • PSPACE=k=1,2,DSPACE(nk)
  • NPSPACE=k=1,2,NSPACE(nk)
  • L=DSPACE(logn)
  • NL=NSPACE(logn)

Configuration graphs:

  • Two tapes:
    • Read-only input tape
    • Read-write work tape (calculate space usage on this tape)
  • Transition: (q,w,i,j)(q,w,i,j)
    • q: state
    • w: content of work tape
      • Here, |w| space usage of the TM
    • i: input tape head
    • j: work tape head
  • Possible # of configurations
    • Suppose M uses f(n) space on input of length n
    • CM(n)|Q||Γ|nf(n)=2O(f(n)), as long as f(n)logn
    • Note that since the input is fixed and the input tape is read-only, we do not need to consider all possible length-n strings that can be written on the input tape.

Inclusions

  • We alread had DTIME(f(n))DSPACE(f(n))
  • DSPACE(f(n))DTIME(2O(f(n)))
  • NSPACE(f(n))NTIME(2O(f(n)))

Proof:

Let LDSPACE(f(n)). Then there is a TM M that decides L using f(n) space. Show there is a TM M that decides L in 2O(f(n)) time.

Construct M this way:

M'(x):
	run (orignal) M on x for C_M(n) steps
	if M accepted x
		return YES
	else 
		return NO
  • M runs in CM(n) time
  • If M hasn’t halted in CM(n) steps, it must be in an infinite loop

PSPACE-completeness (Lecture 18 & 19)Permalink

Canonical PSPACE-complete problem:

  • X={M,x,1s|M is a DTM that accepts x using s space}

To prove that L is PSPACE-complete:

  • Show that LPSPACE
  • If LPSPACE, then L is decided by some TM M that uses s(n) space. Show XPL via f(x)=M,x,1s(n)
    • xLf(x)X

TQBF

  • TQBF is PSPACE-complete
    • 证明待补充
  • TQBF is also NPSPACE-complete
  • So PSPACE=NPSPACE

Geography Game

  • What does “Play 1 has winning strategy” mean?
    • winning trategy, 所以 Play 1 是不能乱走的,是 winning moves,但是你得下出来才行
    • 那对 Player 2 而言,你无论怎么下都是输,所以是 moves of Player 2
  • 后略

Regex Universality

Connectivity problems / Savitch’s theorem (Lecture 20)Permalink

  • L=DSPACE(logn)
  • NL=NSPACE(logn)
  • LNL
  • NLNPSPACE

NL-completeness

  • Karp reductions don’t make sense for defining NL-completeness
  • Define log-space reduction ALB:
    • a log-space computable function f such that xAf(x)B
    • What does “log-space computable function” mean?
      • f requires O(logn) memory to be computed
        • This restriction does NOT apply to the size of output
      • The TM computating f has:
        • read-only input tape
        • unidirectional output tape (“write-once”)
        • O(logn)-length read/write work tape
          • TM cannot “cheat” and use output tape as storage

CONN={(G,s,t)|G is a directed graph with a directed path from s to t}

  • Assume G is represented by an adjacency matrix
  • Vertices are named 1,2,,n
    • It takes O(logn) bits to write a single vertex name
    • Log-space is only enough to store O(1) number of vertex names

Claim: CONN is NL-complete

Proof:

(1) CONNNL

Idea: start from s, nondet guess to t

CONN(G, s, t):
	_curr = s
	
	for i = 1 to n:
		guess _next from {1,2,...,n}
		
		if no _curr → _next edge:
			reject
		if _next == t:
			accept
			
		_curr = _next
	
	reject
  • Scan adjacency matrix cells one by one to detect _curr → _next edge
  • 2 variables: _curr and _next 2logn space

(2) CONN is NL-hard

Reduction strategy: Let M be a NL machine. On input x, output G,s,t so that M accepts xst path in G.

Idea:

  • Let G be the configuration graph of M on x
  • s be the start configuration
  • t be the (unique) accepting configuration

Each configuration can be represented using O(logn) bits, and the adjacency matrix can be generated in log-space as follows (matrix 本身要 O(n2) space,但是这是 output 的 size,并不受 log-space 的限制):

for each configuration i:
	for each configuration j:
		Output 1 if there is a legal transition from i to j, and 0 otherwise
		(if i or j is not a legal state, simply output 0)

The algorithm requires O(logn) space for i and j, and to check for a legal transition.

Corollary: NLP

Proof: Take any ANL. Because CONN is NL-complete, there must exists a log-space function f that for every input x to A, f(x)=G,s,t and xAst path in G.

We can construct an algorithm for A using poly time:

A(x):
	run BFS on f(x) to see whether exists s → t path

So AP.

Summary:

  • LNLPNPPSPACEEXP
  • By the hierarchy theorems (and Savitch’s theorem, below) we know NLPSPACE, and PEXP. But we cannot prove that any of the inclusions above is strict.

Claim: CONNDSPACE(log2n)

Proof: Define a recursive algorithm Path(a,b,i) to seek “is there a ab path whose length 2i?”

Path(a,b,i):
	if i == 0:
		check (a,b) edge in adjacency matrix
	else:
		for each vertex v
			if Path(a,v,i-1) && Path(v,b,i-1)
				accept
		
		reject
  • Call Path(s, t, log n) to check G,s,t
  • Space Usage
    • depth of recursion: logn
    • stack frame of each iteration: space to write down a,b,i,v O(logn)
    • Totally: logn×O(logn)=O(log2n)

Savitch’s theorem: NSPACE(s)DSPACE(s2) (for s>log(n))

  • Implies that “Nondeterminisim saves at most quadratic amount of space”

Proof: 待续

Corollary 7: If s(n)logn is space constructible, then NSPACE(s(n))=coNSPACE(s(n)).

Corollary 8: NL=coNL.

参 Katz §7

Counting complexity / #P / #P-completeness / Parsimonious Reductions (Lecture 21)Permalink

Complexity of Counting:

  • So far in the class: decision problems
  • This unit: counting problems

Counting complexity classes FP, #P

  • For this lecture, “function” means f:{0,1}N (N for “Natural number”)
  • FP={functions computable in deterministically poly time}
  • If M is a Nondet TM, define: #M(x)=number of accepting threads of M on input x
    • 注意这是一个 function,不是一个 decision problem
    • 这个 function 即是一个 counting problem
    • 实际应用时,M 可以是一个 problem,e.g. #SAT: given boolean formula ϕ, determine # of satisfying assignments of ϕ
  • #P={#M|M is a Nondet poly-time TM}
    • #P={functions that count # of accepting threads of a NTM}
    • A function f:{0,1}N is #P if a TM M running in poly time such that f(x)=#M(x)

Claim: FP#P

Proof: Idea: given fFP, design a poly-time NTM M that #M(x)=f(x).

M(x):
	compute f(x) # in poly time
	nondet guess positive integer i
	accept if i <= f(x)

In this way, i=1,2,,f(x) will be accepted, thus f(x) accepting thread.

优化策略:searching space 可能会很大,比如 f(x)=10,你却在 1,,99999 的范围内 guess。此时你可以限制 guess 的 i 不超过 f(x) 的 bit 数。

Claim: #P is closed under addition & multiplication

Proof: Equal to prove f,g#Pf+g#P and fg#P

Let

  • f(x) # accepting threads of Mf on x
  • g(x) # accepting threads of Mg on x

(1) f+g#P

Define Mf+g

M_fg(x):
	nondet guess bit b
	if b == 0:
		return M_f(x)
	else:
		return M_g(x)

相当于把 Mf(x)Mg(x) 这两棵 computation tree 的 root 连接到一个新的 root 上,称为一棵新的 computation tree。

(2) fg#P

Define Mfg

M_fg(x):
	run M_f(x)
	if M_f(x) reached an accepting thread
		return M_g(x)

相当于把 Mf(x) 的 computation tree 的每一个 leaf 上连接一棵 Mg(x) 的 computation tree。

Claim: if FP=#P then P=NP

Proof: Already known that #SAT#P, so if FP=#P, #SATFP.

NP-complete problem SAT can be solved in poly time by running #SAT(x) and see if the result >0.

#P-hardness

  • Not equivalent definitions, but both commonly used
    • Def. 1: f is #P-hard if a poly time algorithm for f implies #P=FP (thus P=NP)
    • Def. 2: f is #P-hard if g#P,g1f via a parsimonious reduction

Parsimonious Reduction

  • Let g & f be counting problems (functions)
  • g1f if there a poly-time function h such that x,f(h(x))=g(x)

Claim: if g1f and fFP, then gFP

Claim: The counting version of any NP-complete problem is #P-complete, i.e. if M is a NTM where L(M) is NP-complete, then #M is #P-complete.

Proof: Use #P-hardness definition 1.

If #MFP, I can solve L(M), a NP-complete problem in poly time by computing whether #M(x)>0.

Claim: #SAT is #P-complete

Proof: Use #P-hardness definition 2.

Given arbitrary M and x, construct formula ϕ such that w:M(x,w)=1ϕ is satisfiable.

Let f(w)=#w that M(x,w)=1. Obviously,

  • f#P and f can represent #P
  • f(w)=#SAT(ϕ).

#SAT is #P-hard.

待续。 Katz §23 需要大量补充进来

#P-completeness of the permanent (Lecture 22)Permalink

暂略

Randomized complexity classes / Amplification (Lecture 23)Permalink

2 ways to define a randomized TM (以下综合 slides、Katz §12Trevisan §5.1):

  • [“Lazy”] Deterministic TM with two transitions functions
    • A random one is applied at each step
  • [“Upfront”] Deterministic TM with an additional read-only “random tape”
    • Each cell of this random tape is initialized randomly (with 0/1)
      • Only finite portion of this random tape can actually be used
  • Randomized TM M(x;r) 你看做接受两个输入比较好理解:
    • For a fixed x, M(x;r) is the deterministic result with random choices r
    • Then what is r?
      • [“Lazy”] The ith bit of r determines which transition function is used at the ith step
      • [“Upfront”] r is the value written on M’s random tape.
      • 也就是说,给定 x,我们在穷举 r=r(1),r(2):
        • 对某些 r(i)M(x;r(i))=1
        • 对某些 r(j)M(x;r(j))=0
        • 这样一来就存在一个 frequency 即 r(i):M(x;r(i))=1|r|。概率就这么出来了。
        • 如果你给了一个新的 x,那么会有一组新的 r(i)r(j),概率也会不同
          • 也就是说,同一个 r(i) 值,不一定有 M(x;r(i))=M(x;r(i))
    • How long is r? I.e. |r|=?
      • |r| is polynomial in the input length |x|
    • 如果给定 x,把 r 看做一个变量,那么 M(x) 即是一个 distribution induced by uniform r
      • 你也可以把 M(x) 看做是一个 Nondet TM,每一个 r(i) 对应一条 computation thread
  • Randomized TM M(x;r) runs in poly time (poly in |x|)
    • 这个很好理解,因为 RT=# of steps=|r|=p(|x|)

PPT = probabilistic, polynomial-time

  • An algorithm A is probabilistic polynomial time (PPT) if it uses randomness (i.e, flips coins) and its running time is bounded by some polynomial in the input size.
  • Alternatively, “expected poly time” means a polynomial (这里指 “多項式”) p() such that E[# of steps]=p(|x|)

RP = Randomized Poly-time

  • RP = languages L of the form:
    • xLPr[M(x)=1]12 (M is a PPT TM, hereafter)
    • xLPr[M(x)=0]=1
      • 注意:xL 时的 M(x)xL 时的 M(x) 是两套不同的 distribution,所以 xL 时的 Pr[M(x)=1]xL 时的 Pr[M(x)=0] 没有任何关系,相加也不为 1。这个概念完整点写是这样的:
        • xLPr[M(x)=1]12 (同时 Pr[M(x)=0]12)
        • xLPr[M(x)=0]=1 (同时 Pr[M(x)=1]=0)
  • 换个角度陈述:LRP if a PPT TM M such that:
    • xLPr[M(x)=1]12
    • xLPr[M(x)=0]=1
  • Trevisan §5.1 的表述更好理解:LRP a PPT TM M and a polynomial p() such that:
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]12
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]=0
  • RP has one-sided error.

题外话:我们还可以类似地定义 P:

  • LP a PPT TM M and a polynomial p() such that:
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]=1
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]=0

Claim: Pr[M(x)=1]12 里的 12 is not fundamental to this definition.

Explaination: I can always “amplify” M to a new PPT M:

def M'(x):
	run M(x) independently t times
	accept if any trial accepts
xLPr[M(x)=1]=1Pr[M(x)=0]1(12)t=112txLPr[M(x)=0]=1t=1

注意 xL 时的 Pr[M(x)=0] 不等于 xL 时的 Pr[M(x)=0].

所以 Pr[M(x)=1]12 里的 12 其实可以是任意的 1t。这个 amplification 更大的意义在于,如果我在多次试验 M(x)(1),M(x)(2),,M(x)(n), 中:

  • 只要有一次 M(x)(i)=1,我就可以 100% 判断 xL
    • 我们称这个判断为 “no false positive”,i.e. 只要是判断为 positive 的(i.e. 判断 xL),一定是 true positive(i.e. 与 ground truth 吻合)。
    • 同理,coRP 是 “no false negative”.
  • 一次 M(x)(i)=0 无法判断一定有 xL

注:写上面两段话时,我还没有领悟到 Trevisan §5.1 的概念更好理解。我自己写的 M(x)(i) 其实就是 M(x;r(i))

注二:Trevisan §5.1 的概念同时也跟方便理解 “什么是 error”?

  • 我们期待的效果是像 P 一样:
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]=1
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]=0
  • 实际情况达不到的话,假设:
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]a
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]b
  • 那么 error 可以这么计算:
    • xL-side error = 1a
    • xL-side error = b0 = b
  • 这样就很好理解 RP 的 one-sided error
  • 下面的 BPP 是 2-sided error;而且两边 error 相等,i.e. a+b=1
    • 我们可以笼统地称 BPP 的 error = 1a,即不用特定指是哪一边。

BPP = Bounded Probabilistic Poly-time

  • BPP = languages L of the form:
    • xLPr[M(x)=1]23
    • xLPr[M(x)=0]23
  • Trevisan §5.1 的表述更好理解:LBPP a PPT TM M and a polynomial p() such that:
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]23
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]13
  • BPP has 2-sided error

How to amplify BPP algorithm?

def M'(x):
	run M(x) independently t times
	take the majority output

Chernoff Bound: How to calculate xLPr[M(x)=1]=?

  • IID
    • 我们总说是 independent and identically distributed random variables,换个方法断句可能更好理解一些:identically distributed, independent r.v.
    • 假设有这么一个试验:在单次试验 X(i) 中,我们要抛 t 次硬币,分别得到 X1(i),X2(i),,Xn(i)t 个结果
    • 执行 n 次试验,我们可以得到 t 组值,分别是 {X1(1),,X1(n)},{X2(1),,X2(n)},,{Xt(1),,Xt(n)}
    • 于是这 t 组值构成 t 个 distribution,也就是 t 个 r.v. X1,X2,,Xt
      • t 个 r.v. 是 identical 的。在这里抛硬币的例子里,大家都是 Bernoulli;不可能 X1 是 Bernoulli 然后 X2 是 Gaussian.
      • t 个 r.v. 是 independent 的。明显,第一次抛并不会影响第二次抛。

  • Suppose X1,X2,,Xt are IID and Pr[Xi=1]=p (this is how they are “identical”), then Chernoff Bound states that Pr[Xi>(1+ϵ)tp]<exp(tp2ϵ2).
    • 简单理解就是,Pr[total # of heads I observedtotal # of heads I expected] is exponentially small
    • 这个其实就是 quantile 的理论,见下图

我们回头算 BPP 的 amplification:

  • Suppose Xi is the event of “obtaining wrong answer in ith trial of M(x)”, so Pr[Xi]13.
  • Let ϵ=12 and tp=t3
Pr[majority gives wrong answer]=Pr[Xi>t2]=Pr[Xi>(1+ϵ)tp]<exp(tp2ϵ2)=exp(t24)
  • Let t=24|x| and then
    • xLPr[M(x)=1]1e|x|
    • xLPr[M(x)=0]1e|x|

Katz §12 很精彩,值得一看

Relations between randomized complexity classes (Lecture 24)Permalink

PP = Probabilistic Poly-time

  • LPP a PPT TM M and a polynomial p() such that:
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]>12
    • xL:Prr{0,1}p(|x|)[M(x;r)=1]12
  • PP is unrealistic
  • PP cannot be amplified

Lemma. LBPP a PPT TM M with error 1|r|

证明我就不详说了,反正一定可以 amplify 到。这个 Lemma 主要是为了证明 Sipser-Lautemann Theorem 服务的。

Sipser-Lautemann Theorem: BPPΣ2Π2

Proof: BPP is closed under complement, i.e. BPP=coBPP.

所以一旦我们证明了 BPPΣ2,马上就能得到 BPP=coBPPΠ2。得证。

下面我们全力证明 BPPΣ2

暂略。

Non-uniformity in terms of circuits & advice / P/poly (Lecture 25)Permalink

Question: Does it help to have a different algorithm for each input length?

Non-uniformity

  • TM is a “uniform” model of computation–a single TM handles all inputs.
  • A circuit only handle input with a fixed length.

Circuits

  • A circuit Cn has n-bit input and is constructed with AND gates, OR gates and NOT gates.
  • A circuit Cn computes a function fC:{0,1}n{0,1}
  • Define SIZE(Cn)=# of gates in Cn and DEPTH(Cn)=max path length from input to output
  • Define Circuit family C={C1,C2,}
    • Ci has i-bit input
    • C accepts x if C|x|(x)=1
    • L(C)={x|C accepts x}
      • i.e. L is decided by C xL,C|x|(x)=1
    • Size of C is f(n) such that CiC,SIZE(Ci)f(i)

PSIZE

  • SIZE(f(n))={ decision problems accepted by circuit family of size f(n)}
  • PSIZE=c>0SIZE(nc)

Claim: Every language (even undecidable) is in SIZE(O(2n))

Proof: Suppose xL and |x|=n. Thus x=x1x2xn.

Use the identity f(x1x2xn)=(x1f(1x2xn))(x1f(0x2xn)) to recursively construct a circuit for f.

The size of the circuit is: s(n)=3+2s(n1) with base case s(1)=1, which solves to s(n)=2×2n3=O(2n).

注意解题技巧:我们这里说的 n,其实都是 |x|,所以你上来一个 xL 其实都是包含了 |x|=n

Relationship between P, NP, PSIZE

  • PPSIZE
  • Possibly NPPSIZE

Theorem (Cook-Levin) Take any poly-time TM M. M’s behavior on input of length n can be written as a poly-size circuit. I.e. PPSIZE

注:PPSIZE 需要另外证明

Theorem (Karp-Lipton-Sipser) If NPPSIZE then PH=Σ2

P/f(n): poly-time with f(n)-bounded advice

Definition: LP/f(n) a poly-time TM M such that n, an advice string a with |a|=f(n) and M(x,a)=1xL

注意:这里是 na 不是 xa。也就是说,所有的 x with |x|=n 用的是同一个 a,而不是每个 x 都配一个。

你可以想象成有一个 advice pool A={a1,a2,} where |an|=f(n) and xL,|x|=nM(x,an)=1

  • P/1 = problems solvable with a 1-bit advice
  • P/n2 = problems solvable with a n2-bit advice
  • P/poly = problems solvable with a p(n)-bit advice

Claim: P/poly=PSIZE

Proof:

(1) P/polyPSIZE

Take any LPSIZE. Then a circuit family C={C1,C2,} such that xL,C|x|(x)=1.

Let a|x| be the description of C|x|. Define TM:

def M(x, a_n):
	interpret a_n as circuit
	evaluates it on x

M runs in poly-time. Therefore xLM(x,a|x|)=1, i.e. LP/poly, i.e. P/polyPSIZE.

(2) P/polyPSIZE

Take any LP/poly. Then a poly-time TM M and a set of advices A={a1,a2,} such that xL,M(x,a|x|)=1.

According to Theorem (Cook-Levin), any poly-time TM can be written as a poly-size circuit, so define C|x| to be the circuit with the same behavior as M(x,a|x|)=1. Thus LPSIZE, i.e P/polyPSIZE.

Karp-Lipton theorem / Meyer’s theorem (Lecture 26)Permalink

Karp-Lipton theorem If NPP/poly, then PH=Σ2

Idea: First use SATNPSATP/poly.

Then use Π2SATΠ2. Note that Π2SAT is Π2-complete. Prove Π2SATΣ2.

Proof: If SATP/poly, then a circuit family C={C1,C2,} of poly size such that ϕ is satisfiable C|ϕ|(ϕ)=1

Π2SAT={ϕ|yz:ϕ(y,z)=1}

ϕΠ2SATyz:ϕ(y,z)=1y:ϕ(y,)=1y:C|ϕ(y,)|(ϕ(y,))=1C|ϕ(y,)|y:C|ϕ(y,)|(ϕ(y,))=1

So Π2SATΣ2, i.e. PH=Σ2.

Meyer’s theorem If EXPP/poly, then EXP=Σ2

Note: Weird sequences:

  • EXP is closed under complememt. Therefore if EXP=Σ2, then also EXP=coEXP=Π2PH=Σ2
  • Already known that PEXP by time hierarchy theorem PΣ2PNP

Proof: Given LEXP decided by TM M, define

  • ST(x,t): On input x, M is in this state after t steps
  • HD(x,t): On input x, M’s tape head is at this index after t steps
  • TP(x,t,i): On input x, M’s tape has this character at index i after t steps
    • t,i are numbers 2p(n) (because of EXP), so required p(n) bits to write them down
    • Suppose the maximum step number is tmax

M accepts xST(x,tmax)=qaccept.

We can imagine ST, HD and TP as 3 circuits. They must be consistent with M’s transition function and tape cells. We can check the consistency in poly time.

\(M accepts x (poly-size circuits) ST,HD,TP: (poly-size bitstrings) t,i:ST(x,t),HD(x,t) and TP(x,t,i) are consistent with Mand ST(x,tmax)=qaccept\)


Categories:

Updated:

Comments