gpt4 book ai didi

compiler-construction - ANTLR 如何决定应用哪个词法分析器规则?最长匹配的词法分析器规则获胜?

转载 作者:行者123 更新时间:2023-12-02 08:12:20 30 4
gpt4 key购买 nike

输入内容:

enter image description here

语法:

grammar test;

p : EOF;

Char : [a-z];

fragment Tab : '\t';
fragment Space : ' ';
T1 : (Tab|Space)+ ->skip;

T2 : '#' T1+ Char+;

匹配结果是这样的:
[@0,0:6='#   abc',<T2>,1:0]    <<<<<<<< PLACE 1
[@1,7:6='<EOF>',<EOF>,1:7]
line 1:0 extraneous input '# abc' expecting <EOF>

请忽略最后一行中的错误。我想知道为什么 token 匹配于 地点 1 T2 .

在语法文件中, T2词法分析器规则去 T1词法规则。所以我希望 T1应该首先应用规则。那么为什么 # abc 中的空格是不是跳过了?

ANTLR 是否使用某种贪婪策略来匹配当前字符流与最长词法分析器规则?

最佳答案

三个规则适用,按此顺序:

  • 最长的比赛首先获胜。
  • 接下来的规则匹配隐式标记(如语法中的 #)。
  • 最后,在平局的情况下(按匹配长度),匹配规则中最早列出的规则获胜。

  • 经过大量的凌晨搜索,我再次在 Sam Harwell 的一篇冗长引述中找到了大部分 Material ,其中他还阐述了贪婪运算符的影响。我记得第一次看到它并在我的 TDAR 副本中勾勒出笔记,但没有引用。

    ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.

    The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.), the ordering constraint requires it use ESC if possible.


    “隐式 token ”规则来自 here .

    关于compiler-construction - ANTLR 如何决定应用哪个词法分析器规则?最长匹配的词法分析器规则获胜?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45450156/

    30 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com