3 分钟阅读

Output Type 1: Boolean (i.e. acceptance only)

Summary:

  • Simplest case: parser just checks whether the input string is valid.
  • Scenario:
    • ✅ regex engine
    • ✅ syntax checker

For $LL$ parsers, each parsing function returns True if parsing matched, False otherwise. E.g.:

def program() -> bool:
    return statements()

For $LR$ parsers, acceptance is implicit: if the parser (i.e. the entry function) consumes all tokens and ends in the accept state, return True; otherwise False.

Output Type 2: Parse Tree

Summary:

  • Case: shows every rule applied in the grammar.
  • Scenario:
    • ✅ debugging grammars
    • ✅ teaching purposes
    • ✅ when you need exact syntax recovery (e.g. pretty-printers, formatters)
    • ❌ too verbose for compilers

For $LL$ parsers:

  • Each parsing function builds a node for its non-terminal, recursively attaches child results, and returns the tree.
  • Very natural because recursion mirrors the grammar.

For $LR$ parsers:

  • Each grammar symbol on the stack carries a pointer to a node.
  • On reduction, create a parent node that hubs those popped children.
  • Push the parent node back into stack.

Output Type 3: AST

Summary:

  • Scenario: ✅ most common output in compilers

For $LL$ parsers: similar to returning parse trees, but now returns a simplified AST node (e.g. BinOp). E.g. when parsing 3+4:

def addition() -> ParseTree:
    left = term()  # A sub-parse tree like `term -> number -> 3`
    expect("+")
    right = term()
    
    return ParseTree("+", left, right)


def addition() -> AST:
    left = Constant()  # An abstract grammar token `Constant(value=3)`
    expect("+")
    right = Constant()
    
    return BinOp("+", left, right) 

For $LR$ parsers:

  • Have to specify semantic actions to rules/productions in the grammar
  • When performing a reduction, its associated semantic action is performed to created an AST node

E.g.:

Expr : Expr '+' Term { $$ = make_binop('+', $1, $3); }

Output Type 4: Direct Evaluation (i.e. interactive interpreter style)

Summary:

  • Case: instead of building a tree, the parser directly executes code as it parses.
    • Downside: ❌ you lose structure, so you can’t do later passes (optimizations, codegen).
  • Scenario: ✅ calculator

For $LL$ parsers: similar to returning parse trees or ASTs, but now returns the result of evaluation. E.g.:

def addition() -> int:
    left = term()  # int of 3
    expect("+")
    right = term()  # int of 4
    
    return left + right

For $LR$ parsers: similar to returning ASTs, have to specify semantic actions for evaluation. E.g.:

Expr : Expr '+' Term { $$ = $1 + $3; }

Output Type 5: Syntax-Directed Translation (i.e. to emit code/text directly; I/O instead of returning)

Summary:

  • Case: parser writes directly to another representation during parsing.
  • Scenario: ✅ source-to-source translator
    • E.g.: a Python -> C translator, which emits C code as it recognizes Python constructs.
  • Downside: ❌ messy and hard to maintain if you need semantic analysis later.

Note that there are other types of translation:

  • Semantic-Directed Translation: Translation based on the semantic structure rather than just syntactic structure
  • Multi-Pass Translation: Instead of translating during parsing, this approach first builds an IR and then performs translation in separate passes.
    • This allows for more complex analysis and optimization.
  • Attribute-Directed Translation: Similar to syntax-directed but focuses on propagating and computing attributes through the parse tree, which then drive the translation decisions.

For $LL$ parsers: every parser function performs output as they recurse. E.g. in if_statement(), output the equivalent C-style if-statement.

For $LR$ parsers: have to specify semantic actions to emit code. E.g.:

if_stmt: IF condition COLON statements
         {
             printf("if (%s) {\n%s\n}\n", $2, $4);
         }

which means whenever we reduce IF condition COLON statements to a if_stmt in Python, we output in C like:

if (<condition>) {
    <statements>
}

Note that there should be similar semantic actions for condition and statements accordingly.

Output Type 6: IR

Summary:

  • Case: some parsers skip ASTs and generate an IR directly
  • Scenario:
    • ❌ rare in modern compilers, usually AST -> IR is cleaner
    • ✅ yacc/bison actions that emit three-address code during parsing

Can be seen as a variant of Output Type 3: AST.

分类:

更新时间:

留下评论