Top-Down Parsers: Recursive Descent, Predictive, and More
1. “Recursive Descent” and “Predictive” are just 2 implementation mechanisims of top-down parsersPermalink
Let:
be the set of all predictive parsers be the set of all recursive descent parsers be the set of all top-down parsers
Informally we have the following relationship:
Informally we can define like:
Examples:
: e.g. a vanilla parser : e.g. an parser using a stack : e.g. an PEG parser which handles prioritized choice calls , if failed, backtracks and calls .- Repeat until some
succeeds
: e.g. a PEG parser like above, but using a stack
2. Permalink
理想情况下你可以认为
现实世界中,你可能会有些
3. : “Table-Driven” vs “Hardcoded”Permalink
对
S : 'a' X;
X : 'x';
def S():
match('a')
X()
def X():
match('x')
对于
S : 'a' X;
| 'b' Y;
X : 'x' ;
Y : 'y' ;
假设 PPT 为
def S():
if lookahead() == 'a':
match('a')
X()
elif lookahead() == 'b':
match('b')
Y()
else:
raise error()
即使上升到
注意
4. PEG: “Parser Combinator”Permalink
PEG parser 经常会讲到 “parser combinator” 的概念。这里 combinator 又是 Lambda Calculus 的概念,但我们只用把它简单理解成:a combinator is a higher-order function that
- takes one or more functions as input,
- combines them using only function application (no free variables or external state),
- returns a new function as a result
在 PEG parser 中主要体现在某些辅助函数,比如这个 parse_one_or_more()
:
def parse_one_or_more(symbol) -> list[dict]:
first_result = symbol()
rest_results = parse_zero_or_more(symbol)
return first_result + rest_results
那么 grammar 中的 <program> ::= <statement>+
的 parsing 就可以写成:
def statement():
pass
def program():
return parse_one_or_more(statement)
Unix pipes 也可以理解成为一种 combinator: new_cmd ::= cmd_1 | cmd_2
5. Hybirds of + PEGPermalink
我完全可以做出一个 “四不像” 的 parser,比如:
- 它可以用
或者 做底 - 但某些 production 我们用 prioritized choice
这么一来,这个 parser 既有 predictive 成分 (来自
我也可以让这个 parser 一半用 recursive calls,一半用 table-driven.
总之就是说,现实情况可能会很复杂 (各种补丁),要灵活应对。
留下评论