1" ,数字(1)和比较运算符(>)应该在 AST 中生成单独的节点。如何实现? 在我的测试中,仅当“c”和“1”用空格分隔时才会出现匹配,例如“term notEx) (A-6ren">
gpt4 book ai didi

没有空格的 Antlr3 匹配标记

转载 作者:行者123 更新时间:2023-12-01 12:49:00 25 4
gpt4 key购买 nike

给定输入 "term >1" ,数字(1)和比较运算符(>)应该在 AST 中生成单独的节点。如何实现?

在我的测试中,仅当“c”和“1”用空格分隔时才会出现匹配,例如“term < 1”。

当前语法:

startExpression  : orEx;

expressionLevel4
: LPARENTHESIS! orEx RPARENTHESIS! | atomicExpression;
expressionLevel3
: (fieldExpression) | expressionLevel4 ;
expressionLevel2
: (nearExpression) | expressionLevel3 ;
expressionLevel1
: (countExpression) | expressionLevel2 ;
notEx : (NOT^)? expressionLevel1;
andEx : (notEx -> notEx)
(AND? a=notEx -> ^(ANDNODE $andEx $a))*;
orEx : andEx (OR^ andEx)*;

countExpression : COUNT LPARENTHESIS WORD RPARENTHESIS RELATION NUMBERS -> ^(COUNT WORD RELATION NUMBERS);

nearExpression : NEAR LPARENTHESIS (WORD|PHRASE) MULTIPLESEPERATOR (WORD|PHRASE) MULTIPLESEPERATOR NUMBERS RPARENTHESIS -> ^(NEAR WORD* PHRASE* ^(NEARDISTANCE NUMBERS));

fieldExpression : WORD PROPERTYSEPERATOR WORD -> ^(FIELDSEARCH ^(TARGETFIELD WORD) WORD );

atomicExpression
: WORD
| PHRASE
;

fragment NUMBER : ('0'..'9');
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9'|'*'|'?');
fragment QUOTE : ('"');
fragment LESSTHEN : '<';
fragment MORETHEN: '>';
fragment EQUAL: '=';
fragment SPACE : ('\u0009'|'\u0020'|'\u000C'|'\u00A0');
fragment UNICODENOSPACES: ('\u0021'..'\u0027'|'\u0030'..'\u0039'|'\u003B'..'\u007E'|'\u00A1'..'\uFFFF');
//fragment UNICODENOSPACES : ('\u0021'..'\u0039'|'\u003B'..'\u007E'|'\u00A1'..'\uFFFF');

LPARENTHESIS : '(';
RPARENTHESIS : ')';

AND : ('A'|'a')('N'|'n')('D'|'d');
OR : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT : ('N'|'n')('O'|'o')('T'|'t');
COUNT:('C'|'c')('O'|'o')('U'|'u')('N'|'n')('T'|'t');
NEAR:('N'|'n')('E'|'e')('A'|'a')('R'|'r');
PROPERTYSEPERATOR : ':';
MULTIPLESEPERATOR : ',';

WS : (SPACE) { $channel=HIDDEN; };
RELATION : LESSTHEN? MORETHEN? EQUAL?;
NUMBERS : (NUMBER)+;
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD : (UNICODENOSPACES)+;

最佳答案

那是因为你的WORD规则匹配太多了:它也匹配">" 所以当">1"写在一起的时候,这 2 个字符被标记为单个 WORD-token。

每当我不确定我的词法分析器在做什么时,我只是让解析器匹配零个或多个任何类型的标记,并打印所有标记的类型和文本:

parse
: (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
;

当您让上面的规则匹配您的输入 "term > 1" 时,将打印以下内容:

WORD            'term'RELATION        '>'WORD            '1'

and of the input "term" >1

WORD            'term'WORD            '>1'

There's no way around this: when the lexer can match 2 (or more) characters (the WORD rule), it will choose that path over a rule defined before it which will only match a single char (the RELATION rule).

Also note that your RELATION rule:

RELATION : LESSTHEN? MORETHEN? EQUAL?;

可能匹配空字符串。确保每个词法分析器规则至少匹配 1 个字符,否则你的词法分析器可能会陷入无限循环。

最好做这样的事情:

RELATION
: (LESSTHEN | MORETHEN)? EQUAL // '<=', '>=', or '='
| (LESSTHEN | MORETHEN) // '<' or '>'
;

关于没有空格的 Antlr3 匹配标记,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13683803/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com