LR Parsing #8: LALR(1) as Approximation of LR(1)
Reference:
1. Intuition & ExamplePermalink
这里需要正式定义一下 core:
Definition: The core of an
注意与 kernel item 区分。
所以如果两个
举个例子:
S' -> S
S -> XX
X -> aX
X -> b
“State#1” in red font represents the accept state. States with the same non-white backgraound color are mergeable by
State#3 + State#6、State#4 + State#7、State#8 + State#9 可以合并,于是得到:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
i00["$$S' \to \cdot S, \Finv$$"] ~~~
i01["$$S \to \cdot XX, \Finv$$"] ~~~
i02["$$X \to \cdot aX, \lbrace a, b \rbrace$$"] ~~~
i03["$$X \to \cdot b, \lbrace a, b \rbrace$$"]
end
subgraph I1["State#1"]
direction TB
i10["$$S' \to S \cdot, \Finv$$"]
end
subgraph I2["State#2"]
direction TB
i20["$$S \to X \cdot X, \Finv$$"] ~~~
i21["$$X \to \cdot aX, \Finv$$"] ~~~
i22["$$X \to \cdot b, \Finv$$"]
end
subgraph I36["State#36"]
direction TB
i30["$$X \to a \cdot X, \lbrace a, b, \Finv\rbrace$$"] ~~~
i31["$$X \to \cdot aX, \lbrace a, b, \Finv\rbrace$$"] ~~~
i32["$$X \to \cdot b, \lbrace a, b, \Finv\rbrace$$"]
end
subgraph I47["State#47"]
direction TB
i40["$$X \to b \cdot, \lbrace a, b, \Finv\rbrace$$"]
end
subgraph I5["State#5"]
direction TB
i50["$$S \to XX \cdot, \Finv$$"]
end
subgraph I89["State#89"]
direction TB
i80["$$X \to aX \cdot, \lbrace a, b, \Finv\rbrace$$"]
end
I0 -->|$$a$$| I36
I0 -->|$$b$$| I47
I0 -->|$$S$$| I1
I0 -->|$$X$$| I2
I2 -->|$$a$$| I36
I2 -->|$$b$$| I47
I2 -->|$$X$$| I5
I36 -->|$$a$$| I36
I36 -->|$$b$$| I47
I36 -->|$$X$$| I89
style I1 color:#f66
style I36 fill:#E8DAEF
style I47 fill:#D4E6F1
style I89 fill:#D4EFDF
合并后是有可能直接变成
2. Conflicts Emerged after Merging / Partially ?Permalink
合并 state 必然会增加 conflict 的 possibility:
- conflicting 的两个 items 如果是分开的两个 states 中的,就没事
- 一合并就必然出事
Theorem: When a conflict arises after state merging, we say the grammar is not
按照 LR Parsing #7: LR(0) vs SLR(1) vs LR(1) - 4. The Expressive Power Perspective 的思路,有:
| Grammar | Possibility of Conflicts | so a conflict-free grammar must be … structually | Expressive Power |
|---|---|---|---|
| 🎲🎲🎲🎲 High | 🚫🚫🚫🚫 most restrictive | 👑 Low | |
| 🎲🎲🎲 | 🚫🚫🚫 | 👑👑 | |
| 🎲🎲 | 🚫🚫 | 👑👑👑 | |
| 🎲 Low | 🚫 least restrictive | 👑👑👑👑 High |
为了避免出现 conflict,我有一个 rough 的 idea:只合并那些不会有 conflict 的 states;有 conflict 的 states 我们不合并就好了。这么一来就得到了一种 half-
学界的工作做得比我这个 rough idea 更细致,比如
3. Parsing Table ConstructionPermalink
Same with
4. State-Merging AlgorithmsPermalink
4.0 老算法改造Permalink
之前的算法 有这么几个问题:
职责上稍微有点不清晰:- 它要负责创建新的 closure
- 同时它又是
填表的依据
没有一个具体的、高效的实现手段
我们可以这么改造一下:
4.1 Brute-ForcePermalink
即在
但是如果是做练习题,这个方法还行,而且
- 假设我们构建好了
的 - For each core
, find all item sets having that core, and compute their union, say where - 假设 step 2 的结果是
:- 如果
,那么 , 也应该有相同的 core,所以你肯定能找到一个 . 于是我们可以删除 ,然后添加- Section 1 就是个很好的例子
都要改成
- 如果
与 的方法一致
4.2 A simple algorithm: step-by-step mergingPermalink
First described by Anderson et al. in 1973.
Notes on 📌:
- 你只可能找出 only one such
,如果有多个的话,它们理应已经被 merge 了 - 这里你可能需要一个
map(core, item_set),查找比较方便
Cons:
- still generates almost all
states (现实中 大几千个 states 被合并成 几百个 states 的情况是很常见的,> 10:1 ratio) 需要 revisit 的频率很高,还是有很多重复的计算
4.3 The Channel Algorithm (used by yacc)Permalink
Described in YACC: Yet Another Compiler-Compiler by Stephen C. Johnson; detailed by Aho, Sethi and Ullman in the Dragon Book.
4.3.0 Intuition: A channel is a passage in the the LR(0) NFA that carries over lookaheads (among items)Permalink
With this example grammar:
// A non-LR(0) grammar for differences of numbers
S -> E
E -> E - T
E -> T
T -> n
T -> (E)
the book Parsing Techniques constructed a NFA for it:

and lookaheads are carried over by two types of channels within the NFA:

is like a placeholder for lookaheads- dotted lines represent “propagated” channel
- dashed lines represent “spontaneous” channel
But in reality you don’t need to run the Channel Algorithm on NFAs like this. Actually it’s easier to work on DFAs.
The skeleton of the Channel Algorithm is like:
- Compute all
kernel items - Compute the lookaheads (by channels) for those kernel items, making them
kernel items - Expand those
kernel items into item sets by
4.3.1 Compute Kernel ItemsPermalink
Definition:
- Kernel items: the initial item
, plus all items whose dots are not at the left end - Non-kernel items: all other items with their dots at the left end, except for
An item set may have
You can use the
4.3.2 Lookahead Determining AlgorithmPermalink
首先我们要定义两种不同的 channel,或者说两种不同的 lookahead-attachment (to kernels) 的形式。
我强烈不建议参考 Dragon Book 的 Example 4.61 下的两个 bullet points,那根本就不是 formal definition,是针对 Example 4.61 的特殊情况的讨论。而且也不要试图去 interpret,因为你很难确定它讲的 “regardless of
参考 Dragon Book 的 Algorithm 4.62. Let:
be a symbol . be an item set be the set of kernel items of be a symbol is the result for “spontaneously generated lookaheads”, where: is a kernel item of is a kernel item of is a lookahead symbol- one such entry means: lookahead
is spontaneously generated by (or more specially ) for
is the result for “propagated lookaheads” where:- one such entry means: lookaheads propagate from
to
- one such entry means: lookaheads propagate from
Textbooks use \lozenge). I don’t like escaping it all the time in Markdown so I prefer
You don’t need to keep
Why the dummy lookahead
如果我们是
然后找到了 ,那么 是 “propagated” 还是 “spontaneous” 要看 “ 是否决定了 的值”。
举个例子:
- 比如说按
,我们可能会有 ,然后假设这个 一路传到了 unchanged - 此时我们能说 “
决定了 的值” 吗?不能,因为还要继续拆:- 如果
is not nullable,那么 ,与 无关,应该算 “spontaneous” - 如果
is nullable,那么 ,与 有关;如果最终我们得到了 (in ),那么这应该算 “propagated”
- 如果
这里就涉及了一个问题:在
, 不管你怎么操作都不会得出一个- 换言之
只可能是 “propagated” - 注意这里要考虑到
实际包含了 3 步: to get- shift
like to get
- 换言之
- on the other hand,如果你拿到一个 lookahead
,那它的值肯定不是被 决定的,所以一定是 “spontaneous”
Special Case:
4.3 Construct Channel GraphPermalink
假设我们有:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I["State $$\;I$$"]
direction TB
i0["$$\phi_i = [A \to \alpha \cdot \beta, \ell_i]$$"]
end
subgraph J["State $$\;J$$"]
direction TB
j0["$$\phi_j = [B \to \gamma X \cdot \delta, \ell_j = ?]$$"]
end
I -->|$$X$$| J
已知
- 若
, 则 - 若
, 则
所以我们所有的 kernel items 构成一个 graph:
- vertex 形如
- edge 即是
可以理解成:- a visitor that generates lookahead
when it reaches vertex - a initializer of all
values- 因为这个 graph 本质就是我们的
DFA,所有的 edges 都已知,所以我们一开始就可以把 “spontaneous” lookaheads 都 assign 给对应的 kernel items
- 因为这个 graph 本质就是我们的
- a visitor that generates lookahead
构成 source vertex
所以剩下的工作就是根据
- 你可能先处理了
,有 - 但后续可能又有
,有 ,那这个新来的 的内容你也要更新给
Dragon Book 是用这么一个 table 来记录这个 fixed-point 的计算过程的:

其中 INIT 就是 PASS 就根据
4.4 The Relations Algorithm (Omitted)Permalink
Designed by DeRemer and Pennello (Efficient computation of LALR(1) lookahead sets).
See Section 9.7.1.3, Parsing Techniques.
4.5 LALR-by-SLR Technique (Omitted)Permalink
See Section 9.7.1.4, Parsing Techniques.
留下评论