LR Parsing #5: Intuition Revisited
1. Questionable DFA
迄今为止,我们这个系列的思路是这样的:
- LR Parsing #1: Intuition:
- stack 上的是 viable prefix
- viable prefixes 构成 regular language
- regular langunage 于是有 DFA
- LR Parsing #2: Structural Encoding of LR(0) Parsing DFA
- 如何构建这个 DFA
- LR Parsing #4: Runtime Encoding of LR(0)/SLR(1) Parsing DFA (How to Construct the Parsing Tables)
- 这个 DFA 是如何运行的
- LR Parsing #3: Simulation of the Parsing DFA (Configuration / Shift-Reduce / Structure of Parsing Table)
- Parser 程序如何 drive 这个 DFA 运行
但我们回看一下 LR Parsing #4: Runtime Encoding of LR(0)/SLR(1) Parsing DFA (How to Construct the Parsing Tables) 中的这个例子:
S' -> S // production 1
S -> a S // production 2
S -> a // production 3
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
A["$$S' \to \cdot S$$"]
B["$$S \to \cdot aS$$"]
C["$$S \to \cdot a$$"]
end
subgraph I1["State#1"]
direction TB
D["$$S \to a \cdot S$$"]
E["$$S \to a \cdot$$"]
F["$$S \to \cdot aS$$"]
G["$$S \to \cdot a$$"]
end
subgraph I2["State#2"]
direction TB
H["$$S \to a S \cdot $$"]
end
subgraph I3["State#3"]
direction TB
I["$$S' \to S \cdot$$"]
end
I0 -->|$$a$$| I1 -->|$$S$$| I2
I0 -->|$$S$$| I3
I1 -->|$$a$$| I1
style I3 color:#f66
“State#3” in red font represents the accept state.
Is this a real DFA?
2. Parser 程序其实模拟的是 DPDA $\approx$ DFA + Stack
比如 shift 就可以表示成:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
A["$$S' \to \cdot S$$"]
B["$$S \to \cdot aS$$"]
C["$$S \to \cdot a$$"]
end
subgraph I1["State#1"]
direction TB
D["$$S \to a \cdot S$$"]
E["$$S \to a \cdot$$"]
F["$$S \to \cdot aS$$"]
G["$$S \to \cdot a$$"]
end
I0 -->|"$$a,(\_,I_0) \big/ (\_,I_0)(a,I_1)$$"| I1
I1 -->|"$$a,(a,I_1) \big/ (a,I_1)(a,I_1)$$"| I1
注意 $I_0 \to I_1$ 的这个 transition,标准写法应该是 $a,(\text{_}, I_0) \big/ (a,I_1)(\text{_}, I_0)$,stack 的 top 朝左,稍微有点别扭,于是我改成和 configuration 一样的写法。
比如 reduce $a \rhd S$ 就可以表示成:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I3["State#3"]
direction TB
I["$$S' \to S \cdot$$"]
end
subgraph I1["State#1"]
direction TB
D["$$S \to a \cdot S$$"]
E["$$S \to a \cdot$$"]
F["$$S \to \cdot aS$$"]
G["$$S \to \cdot a$$"]
end
subgraph I2["State#2"]
direction TB
H["$$S \to a S \cdot $$"]
end
subgraph Reduce_a_to_S["Temporary State for $$\,a \rhd S$$"]
direction TB
X["."]
end
I1 -->|"$$\varepsilon, (a, I_1) \big / \varepsilon$$"| Reduce_a_to_S
Reduce_a_to_S -->|"$$\varepsilon, (a, I_1) \big / (a, I_1)(S, I_2)$$"| I2
Reduce_a_to_S -->|"$$\varepsilon, (\_, I_0) \big / (\_, I_0)(S, I_3)$$"| I3
style I3 color:#f66
如果是 reduce $aS \rhd S$,我们还需要更多的 temporary states,整个 DPDA 会变得很大。
这也就是 parser 程序的智慧所在:我们不需要一个完整定义的、复杂的 DPDA,我们可以用一个简单的 DFA + Stack 等效地实现 DPDA 的效果。
Related and interesting as well: How to simulate backreferences, lookaheads, and lookbehinds in finite state automata?
3. $LR$ Parsing DFA 的确能识别 Viable Prefixes,但需要一点改动
这个改动就是把 $LR$ Parsing DFA 的 states 全部设置成 accept states:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
A["$$S' \to \cdot S$$"]
B["$$S \to \cdot aS$$"]
C["$$S \to \cdot a$$"]
end
subgraph I1["State#1"]
direction TB
D["$$S \to a \cdot S$$"]
E["$$S \to a \cdot$$"]
F["$$S \to \cdot aS$$"]
G["$$S \to \cdot a$$"]
end
subgraph I2["State#2"]
direction TB
H["$$S \to a S \cdot $$"]
end
subgraph I3["State#3"]
direction TB
I["$$S' \to S \cdot$$"]
end
I0 -->|$$a$$| I1 -->|$$S$$| I2
I0 -->|$$S$$| I3
I1 -->|$$a$$| I1
style I0 color:#f66
style I1 color:#f66
style I2 color:#f66
style I2 color:#f66
style I3 color:#f66
Any state in red font is an accept state.
特别地,$\varepsilon$ 是一个合法的 viable prefix.
留下评论