LR Parsing #4: Runtime Encoding of LR(0)/SLR(1) Parsing DFA (How to Construct the Parsing Tables)
1. $LR(0)$ Parsing Table Construction
Recall LR Parsing #2: Structural Encoding of LR(0) Parsing DFA and compute the Canonical Collection of Set of $LR(0)$ Items $C$ for the augmented grammar $G’ = (V, \Sigma, S’, P)$:
\[C = \operatorname{CC}(G') = \lbrace I_0, I_1, \dots, I_n \rbrace\]Number the productions from $1$ to $m$ where $m = \vert P \vert$.
Make two tables, $T_{\operatorname{ACTION}}$ and $T_{\operatorname{GOTO}}$, as in Knuth’s Parsing Table. Or you can think them as two functions:
\[\begin{align} \operatorname{ACTION}:& \; C \times (\Sigma \cup \lbrace \Finv \rbrace) \to \lbrace \text{"shift"}, \text{"reduce 1"}, \dots \text{"reduce m"}, \text{"accept"} \rbrace \newline \operatorname{GOTO}:& \; C \times (V \cup \Sigma) \to C \end{align}\]where $\Finv$ is the EOF symbol, and $\operatorname{GOTO}$ function is exactly the same one we talked about in LR Parsing #2: Structural Encoding of LR(0) Parsing DFA.
$\forall I_i \in C$, fill the $LR(0)$ parsing table following the rules below:
Rule | Condition | Table Construction |
---|---|---|
1 | if $\exists a \in \Sigma$ such that $\operatorname{GOTO}(I_i, a) = I_j$ | mark $T_{\operatorname{ACTION}}[i, a] = \text{“shift”}$ |
mark $T_{\operatorname{GOTO}}[i, a] = j$ | ||
2 | if $\exists A \in V$ such that $\operatorname{GOTO}(I_i, A) = I_j$ | mark $T_{\operatorname{GOTO}}[i, A] = j$ |
3 | if $\exists [S’ \to S \cdot] \in I_i$ | mark $T_{\operatorname{ACTION}}[i, \Finv] = \text{“accept”}$ |
4 | if $\exists [A \to \alpha \cdot] \in I_i$, $A \neq S’$, and $A \to \alpha$ is production $k$ | $\forall e \in \Sigma \cup \lbrace \Finv \rbrace$, mark $T_{\operatorname{ACTION}}[i, e] = \text{“reduce k”}$ |
All entries not marked by the procedure can be considered $\text{“error”}$, and indicate that the input string is not accepted by the grammar.
We can see that $\operatorname{GOTO}$ function and $T_{\operatorname{GOTO}}$ table are essentially the same thing.
Note that for Rule 3:
- If you augment grammar $G$ into $G’$ with $S’ \to S$, then $[S’ \to S \cdot]$ is the accepting position
- If you augment grammar $G$ into $G’$ with $S’ \to S \Finv$, then $[S’ \to S \cdot \Finv]$ is the accepting position
- These two notations mean the same thing; the only difference is how they look on the surface.
- However the latter makes $T_{\operatorname{ACTION}}[i, \Finv]$ easier to understand (since $[S’ \to S \cdot \Finv]$ is a position explicitly expecting $\Finv$)
Note that for Rule 4:
- You can treat Rule 3 as a special case of Rule 4
- I.e. $\text{“accept”}$ is exactly a $\text{“reduce”}$ to $S’$, right before $\Finv$
Note that for Rule 1, modern parsing tables would combine the two table entries into one:
Rule | Condition | Table Construction |
---|---|---|
1 | if $\exists a \in \Sigma$ such that $\operatorname{GOTO}(I_i, a) = I_j$ | mark $T_{\operatorname{ACTION}}[i, a] = \text{“shift j”}$ |
and in this way, the nature of the two tables have changed accordingly:
\[\begin{align} T_{\operatorname{ACTION}}:& \; C \times (\Sigma \cup \lbrace \Finv \rbrace) \to \lbrace \text{"shift 0"}, \dots, \text{"shift n"}, \text{"reduce 1"}, \dots \text{"reduce m"}, \text{"accept"} \rbrace \newline T_{\operatorname{GOTO}}:& \; C \times V \to C \end{align}\]2. $SLR(1)$ Parsing Table Construction
Note that $SLR$ and $SLR(1)$ are interchangable, and $SLR(k>1)$ does exist but is out of our discussion here.
$SLR(1)$ differs with $LR(0)$ on Rule 4. For $SLR(1)$, we have:
Rule | Condition | Table Construction |
---|---|---|
4 | if $\exists [A \to \alpha \cdot] \in I_i$, $A \neq S’$, and $A \to \alpha$ is production $k$ | $\forall e \in \operatorname{FOLLOW}(A)$, mark $T_{\operatorname{ACTION}}[i, e] = \text{“reduce k”}$ |
3. Shift-Reduce Conflicts
$T_{\operatorname{GOTO}}$ is impossible to have conflicts.
$T_{\operatorname{ACTION}}$ may have conflicts, the most common being shift-reduce conflicts, i.e. one entry $T_{\operatorname{ACTION}}(I_i, a)$ may have two values, a $\text{“shift”}$ and a $\text{“reduce k”}$.
It’s impossible to have shift-shift conflicts in $T_{\operatorname{ACTION}}$.
Reduce-reduce conflicts are possible but kinda rare. E.g. when you grammar has both $A \to a$ and $B \to a$, how do you reduce $a \rhd ?$
Theorem: If any conflicting actions result from $LR(0)$ parsing table construction, we say the grammar is not $LR(0)$. $\blacksquare$
Theorem: If any conflicting actions result from $SLR(1)$ parsing table construction, we say the grammar is not $SLR(1)$. $\blacksquare$
4. $LR(0)$ vs $SLR(1)$
Theorem: Every $LR(0)$ grammar is $SLR(1)$. $\blacksquare$
The following grammar is $SLR(1)$ but not $LR(0)$:
S' -> S // production 1
S -> a S // production 2
S -> a // production 3
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
A["$$S' \to \cdot S$$"]
B["$$S \to \cdot aS$$"]
C["$$S \to \cdot a$$"]
end
subgraph I1["State#1"]
direction TB
D["$$S \to a \cdot S$$"]
E["$$S \to a \cdot$$"]
F["$$S \to \cdot aS$$"]
G["$$S \to \cdot a$$"]
end
subgraph I2["State#2"]
direction TB
H["$$S \to a S \cdot $$"]
end
subgraph I3["State#3"]
direction TB
I["$$S' \to S \cdot$$"]
end
I0 -->|$$a$$| I1 -->|$$S$$| I2
I0 -->|$$S$$| I3
I1 -->|$$a$$| I1
style I3 color:#f66
“State#3” in red font represents the accept state.
If we construct a $LR(0)$ parsing table for the grammar, we’ll get:
State# | ACTION | GOTO | ||
---|---|---|---|---|
$a$ | $\Finv$ | $a$ | $S$ | |
$0$ | shift | $1$ | $3$ | |
$1$ | shift / reduce 3 | reduce 3 | $1$ | $2$ |
$2$ | reduce 2 | reduce 2 | ||
$3$ | accept |
The conflict at $T_{\operatorname{ACTION}}(I_1, a)$ is a dilemma at such a configuration:
\[\big[ (\_, I_0), (a, I_1) \big] \mid a\]- you can either shift the input $a$ (according to $[S \to \cdot a S]$)
- or reduce the current $a$ on stack top (according to $[S \to \cdot a]$)
If we construct a $SLR(1)$ parsing table for the grammar, we’ll get:
State# | ACTION | GOTO | ||
---|---|---|---|---|
$a$ | $\Finv$ | $a$ | $S$ | |
$0$ | shift | $1$ | $3$ | |
$1$ | shift | reduce 3 | $1$ | $2$ |
$2$ | reduce 2 | reduce 2 | ||
$3$ | accept |
since $\operatorname{FOLLOW}(S) = \lbrace \Finv \rbrace$ and therefore for the $I_0$ row, only $T_{\operatorname{ACTION}}(I_1, \Finv)$ is marked reduce.
It means at the same configuration:
\[\big[ (\_, I_0), (a, I_1) \big] \mid a\]$SLR(1)$ will prefer shifting in a new $a$ over reducing the current $a$ on stack top.
5. Digression: Different Representation of DFA and Parsing Table
ComVis will represent our DFA as:
---
config:
layout: elk
look: handDrawn
---
flowchart LR
subgraph I0["State#0"]
direction TB
A["$$S' \to \cdot S$$"]
B["$$S \to \cdot aS$$"]
C["$$S \to \cdot a$$"]
end
subgraph I1["State#1"]
direction TB
D["$$S \to a \cdot S$$"]
E["$$S \to a \cdot$$"]
F["$$S \to \cdot aS$$"]
G["$$S \to \cdot a$$"]
end
subgraph I2["State#2"]
direction TB
H["$$S \to a S \cdot $$"]
end
subgraph I3["State#3"]
direction TB
I["$$S' \to S \cdot \Finv$$"]
end
subgraph I4["State#4"]
direction TB
J["$$S' \to S \Finv \cdot$$"]
end
I0 -->|$$a$$| I1 -->|$$S$$| I2
I0 -->|$$S$$| I3 -->|$$\Finv$$| I4
I1 -->|$$a$$| I1
style I4 color:#f66
and the corresponding $SLR(1)$ parsing table would be like:
State# | ACTION | GOTO | |
---|---|---|---|
$a$ | $\Finv$ | $S$ | |
$0$ | shift 1 | $3$ | |
$1$ | shift 1 | reduce 3 | $2$ |
$2$ | reduce 2 | reduce 2 | |
$3$ | shift 4 | ||
$4$ | accept | accept |
留下评论