LR Parsing #1: Intuition
Clue 1️⃣: Reduction to Root == Pruning off HandlesPermalink
Knuth 写于 1965 年的里程碑著作 On the Translation of Languages from Left to Right 提供了一个非常好的 intuition:假设你已经有了一个 parse tree,你把它倒过来,那么 parsing 的过程就是从 leaves 开始一路 reduce 到 root 的过程。
比如这么一个语法:
S -> AD
A -> aC
B -> bcd
C -> BE
D -> ε
E -> e
然后我们针对
可以看出:
- 我们一定是先 reduce
,可以想象成把 三个 leaves 全部 prune 掉 - 然后 reduce
,可以想像成把 这个 leaf 给 prune 掉 - 你一定是要等到
都被 prune 掉之后,才能 reduce
由此我们引出 handle 的概念:
Informal Definition: The handle of a parse tree is the leftmost set of adjacent leaves forming a complete branch. In other words, if
flowchart TD
Xj(("$$X_{j}$$")) --> Y(($$Y$$))
Xj1(("$$X_{j+1}$$")) --> Y
D(($$\cdots$$)) --> Y
Xk(("$$X_{k}$$")) --> Y
exists in the parse tree, for some
于是有:
是 上的 handle- 当
被 prune 掉之后, 是 上的 handle - 当
都被 prune 掉之后, 才能成为 上的 handle
Knuth 还额外补充道:
The reader may easily verify, in fact, that “handle pruning” always produces, in reverse, the derivation obtained by replacing the rightmost intermediate character at each step, and this may be regarded as an alternative way to define the concept of handle. During the pruning process, all leaves to the right of the handle are terminals, if we begin with all terminal leaves.
Bottom-up parsing 也可以被视为一个 “识别 handle、确认 handle” 的过程。
Clue 2️⃣: Array Buffer for Pruning => StackPermalink
假设我们已经知道了哪些 string 是 handle、以及何时做 reduction,我们要依此写一个 parser。还是考虑上面
- parser 先接收了
,虽然我们知道 是 handle,但目前时机不对,我们只能暂存 - parser 等到接收了完整的
之后才能做第一个 reduction ,这个 我们也需要暂存 - parser 最后接收了
,可以做 reduction ,这个 也需要暂存 - parser 已经接收了所有的 input,但此时我们手头上有的是
,好消息是可以继续做 reduction
你会发现:这是个天然适用 stack 的场景!
- 暂时不能做 reduction 的,就 push in
- 发现能做 reduction 时,就把 handle 给 pop out
- 做完 reduction 之后,把结果再 push in 进来,供后续的 reduction 使用
stack == ['a'] # 1. Consume 'a'
stack == ['a', 'b'] # 2. Consume 'b'
stack == ['a', 'b', 'c'] # 3. Consume 'c'
stack == ['a', 'b', 'c', 'd'] # 4. Consume 'd'
stack == ['a', 'B'] # 5. Reduce 'bcd |> B'
stack == ['a', 'B', 'e'] # 6. Consume 'e'
stack == ['a', 'B', 'E'] # 7. Reduce 'e |> E'
这个过程就完美模拟了上面提到的 “pruning off”.
Clue 3️⃣: Stack Contents == Viable PrefixesPermalink
有了 stack,我们就顺势引入 viable prefix 的概念:
Informal Definition: A viable prefix is a string that can appear in the stack during parsing. On the other hand, the stack can only contain viable prefix during parsing.
Clue 4️⃣: Viable Prefixes form a regular language => LR Parsers are DFAsPermalink
Knuth’s breakthrough came from several connected realizations:
- He recognized that the parser’s stack wasn’t just storage: it represented the parser’s “state of knowledge” about what had been seen and what was expected.
- Then you could use a finite automaton to control the parsing process, where each state represented a particular configuration of partial parsing progress.
Formal Defition FYIPermalink
Extra Reference: UW CSE 401
HandlePermalink
Definition: Given a grammar
is the production used for reduction is a location within the string
Note that:
- Some textbooks specify
. Not a big deal since is known. - If
can uniquely determine the production (like there is only one with as RHS), we can informally write the handle as .
Theorem: A grammar
In other words, if
Viable PrefixPermalink
Definition: Each intermediate string that appears in a rightmost derivation is a right sentential form.
我强烈不建议使用 Dragon Book 上的 viable prefix 定义!这里我们改用 Parsing Theory Volume II: LR(k) and LL(k) Parsing 上的定义。
Definition: Given a grammar
Note that:
- Any viable prefix is a prefix of a complete viable prefix.
- A complete viable prefix is a prefix of a right sentential form.
留下评论