Top-Down Parsers: Recursive Descent, Predictive, and More

June 26, 2025 2 分钟阅读

1. “Recursive Descent” and “Predictive” are just 2 implementation mechanisims of top-down parsersPermalink

Let:

$P$ be the set of all predictive parsers
$Q$ be the set of all recursive descent parsers
$Δ$ be the set of all top-down parsers

Informally we have the following relationship:

$P \subset Δ$
$Q \subset Δ$
$P \cap Q \neq \emptyset$
$P ⊄ Q$
$Q ⊄ P$

Informally we can define like:

\begin{aligned} P & = {p \in Δ ∣ p deterministically picks a production for derivation in each step} \\ P^{∁} & = {p \in Δ ∣ p undeterministically picks a production, backtracking if failed} \\ Q & = {p \in Δ ∣ p calls procedure A (), which recusively calls X_{1} (), \dots, X_{n} (), if production A \to X_{1} \dots X_{n} is picked} \\ Q^{∁} & = {p \in Δ ∣ p maintains a stack explicitly, rather than implicitly via recursive calls} \end{aligned}

Examples:

$P \cap Q$ : e.g. a vanilla $L L (1)$ parser
$P \cap Q^{∁}$ : e.g. an $L L (1)$ parser using a stack
$P^{∁} \cap Q$ : e.g. an PEG parser which handles prioritized choice $A \to e_{1} / e_{2} / \dots / e_{n}$
- $A ()$ calls $e_{1} ()$ , if failed, backtracks and calls $e_{2} ()$ .
- Repeat until some $e_{i} ()$ succeeds
$P^{∁} \cap Q^{∁}$ : e.g. a PEG parser like above, but using a stack

2. $L L (k) \overset{?}{=} P$ Permalink

理想情况下你可以认为 $L L (k) = P$ .

现实世界中，你可能会有些 $L L (k)$ variant，比方说：我主体框架是个 $L L (1)$ parser，但其中有一个 production $A \to α ∣ β$ 是用 special rule 限定的。此时你这个 parser 肯定不是严格意义的 $L L (1)$ ，但它也能算是 predictive. 此时就是 $L L (k) \subset P$ .

3. $L L (k)$ : “Table-Driven” vs “Hardcoded”Permalink

对 $L L (0)$ 而言，根本就不需要 PPT (predictive parsing table)，你照着 production 写就完事了。比如：

S : 'a' X;
X : 'x';

def S():
    match('a')
    X()

def X():
    match('x')

对于 $L L (1)$ 而言，PPT (predictive parsing table) 中的每一个 cell 至多只有 1 条 production，所以 parser 程序完全没有查表的必要，直接 if-else hardcode 就搞定了。比如：

S : 'a' X;
  | 'b' Y;
X : 'x' ;
Y : 'y' ;

假设 PPT 为 $M$ ，有 $M [S, a] = (S \to a X)$ and $M [S, b] = (S \to b Y)$ ，但程序可以写成：

def S():
    if lookahead() == 'a':
        match('a')
        X()
    elif lookahead() == 'b':
        match('b')
        Y()
    else:
        raise error()

即使上升到 $L L (2)$ ，你仍然是可以用 if-else 的，但是就不够优雅、高效，此时就可以在程序里上 PPT 了。

注意 $P \cap Q^{∁}$ 一般会用 PPT 实现，以至于有的地方讲到 “table-driven” 都默认是 stack + table 了，I don’t like this.

4. PEG: “Parser Combinator”Permalink

PEG parser 经常会讲到 “parser combinator” 的概念。这里 combinator 又是 Lambda Calculus 的概念，但我们只用把它简单理解成：a combinator is a higher-order function that

takes one or more functions as input,
combines them using only function application (no free variables or external state),
returns a new function as a result

在 PEG parser 中主要体现在某些辅助函数，比如这个 parse_one_or_more():

def parse_one_or_more(symbol) -> list[dict]:
    first_result = symbol()
    rest_results = parse_zero_or_more(symbol)
    return first_result + rest_results

那么 grammar 中的 <program> ::= <statement>+ 的 parsing 就可以写成：

def statement():
    pass

def program():
    return parse_one_or_more(statement)

Unix pipes 也可以理解成为一种 combinator: new_cmd ::= cmd_1 | cmd_2

5. Hybirds of $L L (k)$ + PEGPermalink

我完全可以做出一个 “四不像” 的 parser，比如：

它可以用 $L L (0)$ 或者 $L L (1)$ 做底
但某些 production 我们用 prioritized choice $S ::= e_{1} / e_{2} / \dots / e_{n}$

这么一来，这个 parser 既有 predictive 成分 (来自 $L L (0)$ 或者 $L L (1)$ )，又有 backtracking 成分 (来自 PEG 的 prioritized choice). 理论上这个 parser 既 $\notin P$ 也 $\notin P^{∁}$ ，但它 $\in Δ$ .

我也可以让这个 parser 一半用 recursive calls，一半用 table-driven.

总之就是说，现实情况可能会很复杂 (各种补丁)，要灵活应对。

X Facebook LinkedIn Bluesky

Top-Down Parsers: Recursive Descent, Predictive, and More

1. “Recursive Descent” and “Predictive” are just 2 implementation mechanisims of top-down parsersPermalink

2. $L L (k) \overset{?}{=} P$ Permalink

3. $L L (k)$ : “Table-Driven” vs “Hardcoded”Permalink

4. PEG: “Parser Combinator”Permalink

5. Hybirds of $L L (k)$ + PEGPermalink

分享

留下评论

猜您还喜欢

LL(0) vs. LL(1) Grammars: From Single-String to Flexible Repetition

Lark’s implementation of computing FIRST and FOLLOW sets

LL(1) Parsing

Appetizers Before Parsing: Serving Order

1. “Recursive Descent” and “Predictive” are just 2 implementation mechanisims of top-down parsersPermalink

2. LL(k)=?PPermalink

3. LL(k): “Table-Driven” vs “Hardcoded”Permalink

4. PEG: “Parser Combinator”Permalink

5. Hybirds of LL(k) + PEGPermalink

分享

留下评论

猜您还喜欢

LL(0) vs. LL(1) Grammars: From Single-String to Flexible Repetition

Lark’s implementation of computing FIRST and FOLLOW sets

LL(1) Parsing

Appetizers Before Parsing: Serving Order

2. $L L (k) \overset{?}{=} P$ Permalink

3. $L L (k)$ : “Table-Driven” vs “Hardcoded”Permalink

5. Hybirds of $L L (k)$ + PEGPermalink