LR Parsing #7: LR(0) vs SLR(1) vs LR(1)
- Reference: SLR and LR(1) Parsing
- Online Tool: JSMachines LR(1) Parser Generator
1. The DFA Perspective
1.1 Comparison
This is a classic example in compiler theory that clearly separates $LR(1)$ from $SLR(1)$ and $LR(0)$:
S' -> S // production 1
S -> L = R // production 2
S -> R // production 3
L -> * R // production 4
L -> id // production 5
R -> L // production 6
or more technically:
// Start rule
S : L '=' R # Assignment
| R # ExpressionOnly
;
// Left-hand side
L : '*' L # Dereference
| ID # Identifier
;
// Right-hand side
R : L # RHS
;
// Lexer rules
ID : [a-zA-Z_][a-zA-Z_0-9]* ;
You can think of the $\ast$ above as the dereference operator on pointers in C.
Consider input $w = \overline{\text{id} = \text{id}}$. When the first $\text{id}$ is reduced to $L$, we’ll have the following configuration:
\[\big[ (\_, \mathcal{S_0}), (L, \mathcal{S_2}) \big] \; \mid \; = \text{id}\]There is a dilemma for $LR(0)$ parsing:
- you can further reduce $L \rhd R$ (by production $6$)
- or shift the $=$ into the stack (by production $2$)
$SLR(1)$ tries to fix this problem but in vain, because:
- $\operatorname{FOLLOW}(L) = \lbrace =, \Finv\rbrace$ because:
- $S \Rightarrow L = R$
- $S \Rightarrow R \Rightarrow L$
- Therefore $SLR(1)$ will still advice reducing $L \rhd R$ on reading $=$ (by production $6$)
- Meanwhile shifting the $=$ into the stack is still possible (by production $2$)
$LR(1)$ solves this problem by having the following states:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
A_0["$$S' \to \cdot S, \Finv$$"] ~~~
B_0["$$S \to \cdot L = R, \Finv$$"] ~~~
C_0["$$S \to \cdot R, \Finv$$"] ~~~
D_0["$$L \to \cdot \ast R, \Finv$$"] ~~~
E_0["$$L \to \cdot id, \Finv$$"] ~~~
F_0["$$R \to \cdot L, \Finv$$"]
end
subgraph I2["State#2"]
direction TB
D["$$S \to L \cdot = R, \Finv$$"] ~~~
E["$$R \to L \cdot, \Finv$$"]
end
subgraph I6["State#6"]
direction TB
A_6["$$S \to L = \cdot R, \Finv$$"] ~~~
B_6["$$R \to \cdot L, \Finv$$"] ~~~
C_6["$$L \to L = \cdot \ast R, \Finv$$"] ~~~
D_6["$$L\to \cdot id, \Finv$$"]
end
I0 -->|$$L$$| I2
I2 -->|$$=$$| I6
- When in
State#2
, $LR(1)$ parser will surely shift $=$ intoState#6
- Reduction $L \rhd R$ is only possible on reading a $\Finv$
1.2 General Form of Shift-Reduce Conflicts?
Discussion in this section is totally based on my experience and thus quite informal.
It looks like a general form of shift-reduce conflicts is like this:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I3["State#3"]
direction TB
D["$$X \to \underline\alpha a \cdot b \underline\beta$$"] ~~~
E["$$Y \to \underline\alpha a \cdot $$"]
end
subgraph I5["State#5"]
direction TB
A_6["$$X \to \underline\alpha a b \cdot \underline\beta$$"]
end
I3 -->|$$b$$| I5
where:
- a shifting item and a reducing item (on the same non-terminal) exist in the same state
- it’s possible that $X = Y$
If we define:
- $\operatorname{LA}(Y, I)$ as the lookahead(s) of variable $Y$ in state $I$
- $\operatorname{LA}(Y)$ as the union of lookahead(s) of variable $Y$ across all states
we can consider that, following Section 2.2:
- In $LR(0)$, $\operatorname{LA}(Y, I_3) = \Sigma \Longrightarrow \mathbb{P}(\text{conflict}) = \mathbb{P}(b \in \Sigma) = 100\%$ (inevitable)
- In $SLR(1)$, $\operatorname{LA}(Y, I_3) = \operatorname{FOLLOW}(Y) \Longrightarrow \mathbb{P}(\text{conflict}) = \mathbb{P}(b \in \operatorname{FOLLOW}(Y))$ is lower
- In $LR(1)$, $\operatorname{LA}(Y, I_3) \subseteq \operatorname{FOLLOW}(Y) \Longrightarrow \mathbb{P}(\text{conflict}) = \mathbb{P}(b \in \operatorname{LA}(Y, I_3))$ is even lower
2. The Parse Tree Perspective
2.1 Comparison
Consider the same input $w = \overline{\text{id} = \text{id}}$ with the examplar grammar above.
With $LR(1)$ parsing, the following parse tree is feasible:
---
config:
look: handDrawn
---
flowchart TB
S_0[$$S' \to \cdot S, \Finv$$]
S_1[$$S \to \cdot L = R, \Finv$$]
L[$$L \to \cdot id, =$$]
R[$$R \to \cdot L, \Finv$$]
L2[$$L \to \cdot id, \Finv$$]
subgraph Input["input"]
direction LR
i0["$$id$$"]
i1["$$=$$"]
i2["$$id$$"]
i3["$$\Finv$$"]
end
S_0 ~~~ S_1 --> S_0
S_1 ~~~ S_0
S_1 ~~~ L --> S_1
L ~~~ S_1
S_1 ~~~ R --> S_1
R ~~~ S_1
L ~~~ i0 --> L
i0 ~~~ L
L ~~~ i1 -.✅.-> L
i1 ~~~ L
S_1 ~~~ i1 --> S_1
i1 ~~~ S_1
R ~~~ L2 --> R
L2 ~~~ R
L2 ~~~ i2 --> L2
i2 ~~~ L2
L2 ~~~ i3 -.✅.-> L2
i3 ~~~ L2
R ~~~ i3 -.✅.-> R
i3 ~~~ R
S_1 ~~~ i3 -.✅.-> S_1
i3 ~~~ S_1
Meanwhile the following parse tree is precluded due to the lookahead mismatch:
---
config:
look: handDrawn
---
flowchart TB
S_0[$$S' \to \cdot S, \Finv$$]
S_1[$$S \to \cdot R, \Finv$$]
L[$$L \to \cdot id, \Finv$$]
R[$$R \to \cdot L, \Finv$$]
subgraph Input["input"]
direction LR
i0["$$id$$"]
i1["$$=$$"]
i2["$$id$$"]
i3["$$\Finv$$"]
end
S_0 ~~~ S_1 --> S_0
S_1 ~~~ S_0
S_1 ~~~ R --> S_1
R ~~~ S_1
R ~~~ L --> R
L ~~~ R
L ~~~ i0 --> L
i0 ~~~ L
L ~~~ i1 -.❌.-> L
i1 ~~~ L
The lookahead in $LR(1)$ plays a protective role in the parsing that can be considered as:
- a sentinel that guards against invalid reduction, or wrong shapes of the parse tree
- a postcondition that ensures the rest of the input is still parsable
Compared to $LR(1)$:
- $LR(0)$ parsing cannot preclude the second parse tree above from construction, because it has no such protection mechanism at all.
- $SLR(1)$ parsing cannot either because its protection mechanism is weak.
2.2 $\operatorname{FOLLOW}(L)$ vs $\operatorname{LA}(L)$
$SLR(1)$ parsing with the above example will allow the following parse tree, where $\operatorname{FOLLOW}(L)$ can be considered as a set of “lookaheads”:
---
config:
look: handDrawn
---
flowchart TB
S_0[$$S' \to \cdot S$$]
S_1[$$S \to \cdot R$$]
L[$$L \to \cdot id, \lbrace \Finv, = \rbrace$$]
R[$$R \to \cdot L$$]
subgraph Input["input"]
direction LR
i0["$$id$$"]
i1["$$=$$"]
i2["$$id$$"]
i3["$$\Finv$$"]
end
S_0 ~~~ S_1 --> S_0
S_1 ~~~ S_0
S_1 ~~~ R --> S_1
R ~~~ S_1
R ~~~ L --> R
L ~~~ R
L ~~~ i0 --> L
i0 ~~~ L
L ~~~ i1 -.✅.-> L
i1 ~~~ L
It looks like $\operatorname{FOLLOW}(L) = \lbrace \Finv, = \rbrace$ in $SLR(1)$ is the superset of $L’s$ lookahead $=$ (of the same state) in $LR(1)$. Can we make such a conclusion here? Yes!
Lemma: $\operatorname{FOLLOW}(A) = \bigcup \operatorname{FIRST}(\beta a) \text{ for all context } [B \to \alpha \cdot A \beta, a]$. $\blacksquare$
In this way, we can consider $\forall A \in V$:
- $LR(0)$ as having the widest range of lookaheads, i.e. $\operatorname{LA}(A) = \Sigma$
- $SLR(1)$ as having a narrower range of lookaheads, i.e. $\operatorname{LA}(A) = \operatorname{FOLLOW}(A)$
- $LR(1)$ as having the narrowest range of lookaheads, i.e. $\operatorname{LA}(A) \subseteq \operatorname{FOLLOW}(A)$
3. Digression: Propagation of Lookaheads
从 $\operatorname{GOTO}^{(1)}$ 的定义可以看出,lookahead 对 transition 没有任何的影响,它更像是 transition 中 carry-on 的信息:transition 你得把它带上,但是不需要用它。
lookahead 的作用更集中在 transition chain 的头和尾:initializing closure 和 determining reduction. 而 initializing 的源头又在于 $[S’ \to \cdot S, \Finv]$,所以你可以把 lookaheads 理解成 $[S’ \to \cdot S, \Finv]$ 埋好的 sentinels.
4. The Expressive Power Perspective
Grammar | Granularity of Parsing Table Construction Rules | Possibility of Conflicts | so a conflict-free grammar must be … structually | Expressive Power |
---|---|---|---|---|
$LR(0)$ | Coarse | 🔴 High | most restrictive | 🟢 Low |
$SLR(1)$ | Medium | 🟡 Medium | medium restrictive | 🟡 Medium |
$LR(1)$ | Fine | 🟢 Low | least restrictive | 🔴 High |
5. Beyond $LR(1)$
First of all, $LR(1)$ cannot fix ambiguity.
You definitely can design a unambiguous grammar that has reduce-reduce conflicts even with $LR(1)$ parsing. (See LR Parsing #4)
The following grammar is unambiguous and have a shift-reduce conflicts.
S' -> S
S -> A a a
A -> b
A -> b a
Consider input $w = baa$. When the first $b$ is shifted in, we’ll have the following configuration:
\[\big[ (\_, \mathcal{S_0}), (b, \mathcal{S_3}) \big] \; \mid \; a a\]There is a dilemma even for $LR(1)$ parsing:
- you can choose to reduce $b \rhd a$
- or shift the $a$ into the stack
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
A_0["$$S' \to \cdot S, \Finv$$"] ~~~
B_0["$$S \to \cdot Aaa, \Finv$$"] ~~~
C_0["$$A \to \cdot b, a$$"] ~~~
D_0["$$A \to \cdot ba, a$$"]
end
subgraph I3["State#3"]
direction TB
D["$$A \to b \cdot , a$$"] ~~~
E["$$A \to b \cdot a, a$$"]
end
subgraph I5["State#5"]
direction TB
A_6["$$A \to ba \cdot, a$$"]
end
I0 -->|$$b$$| I3
I3 -->|$$a$$| I5
留下评论