gpt4 book ai didi

c++ - 从监听器中检索 antlr4 解析器中跳过的空格

转载 作者:行者123 更新时间:2023-11-30 04:45:54 32 4
gpt4 key购买 nike

我正在尝试从已解析的消息构造一个对象。我正在使用 Antlr4 和 C++我的问题是我需要在词法分析/解析过程中跳过空格,但是当我在 Listener 中构造消息对象时我必须取回它们。这是我的语法

grammar MessageTest;
WS: ('\t' | ' ' | '\r' | '\n' )+ -> skip;

message:
messageInfo
startOfMessage
messageText+
| EOF;

messageInfo:
senderName
filingTime
receiverName
;

senderName: WORD;

filingTime: DIGITS;

receiverName: WORD;

messageText: ( WORD | DIGITS | ALLOWED_SYMBOLS)+;

startOfMessage: START_OF_MESSAGE_SYMBOL ;

START_OF_MESSAGE_SYMBOL:':';

WORD: LETTER+;

DIGITS: DIGIT+;

LPAREN: '(';
RPAREN: ')';

ALLOWED_SYMBOLS: '-'| '.' | ',' | '/' | '+' | '?';

fragment LETTER: [A-Z];

fragment DIGIT: [0-9];

所以这个语法运行良好,我的解析树对于以下消息示例是正确的:JOHN0120JANE:HI HOW ARE YOU?我得到了这个解析树:

message (
messageInfo (
senderName (
"JOHN"
)
filingTime (
"0120"
)
receiverName (
"JANE"
)
)
startOfMessage (
":"
)
messageText (
"HI"
"HOW"
"ARE"
"YOU"
"?"
)
)

问题是当我试图检索整个 messageText 时:HI HOW ARE YOU? 我改为从 MessageTextContext

获取 HIHOWAREYOU?

我做错了什么?

最佳答案

getText() 检索函数从不考虑跳过或隐藏的标记。但是通过使用存储在生成的标记中的索引,很容易获得输入的原始文本(即使只是对应于特定解析规则的范围)。解析规则上下文包含一个开始节点和一个结束节点,因此很容易从上下文转到原始输入,如下所示:


std::string MySQLRecognizerCommon::sourceTextForContext(ParserRuleContext *ctx, bool keepQuotes) {
return sourceTextForRange(ctx->start, ctx->stop, keepQuotes);
}

//----------------------------------------------------------------------------------------------------------------------

std::string MySQLRecognizerCommon::sourceTextForRange(tree::ParseTree *start, tree::ParseTree *stop, bool keepQuotes) {
Token *startToken = antlrcpp::is<tree::TerminalNode *>(start) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(start)->start;
Token *stopToken = antlrcpp::is<tree::TerminalNode *>(stop) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(stop)->stop;
return sourceTextForRange(startToken, stopToken, keepQuotes);
}

//----------------------------------------------------------------------------------------------------------------------

std::string MySQLRecognizerCommon::sourceTextForRange(Token *start, Token *stop, bool keepQuotes) {
CharStream *cs = start->getTokenSource()->getInputStream();
size_t stopIndex = stop != nullptr ? stop->getStopIndex() : std::numeric_limits<size_t>::max();
std::string result = cs->getText(misc::Interval(start->getStartIndex(), stopIndex));
if (keepQuotes || result.size() < 2)
return result;

char quoteChar = result[0];
if ((quoteChar == '"' || quoteChar == '`' || quoteChar == '\'') && quoteChar == result.back()) {
if (quoteChar == '"' || quoteChar == '\'') {
// Replace any double occurence of the quote char by a single one.
replaceStringInplace(result, std::string(2, quoteChar), std::string(1, quoteChar));
}

return result.substr(1, result.size() - 2);
}

return result;
}

此代码专为与 MySQL 一起使用而定制(例如,wrt. 引号字符),但很容易适应任何其他用例。关键部分是使用标记(例如从解析规则上下文中获取)并从字符输入流中获取原始输入。

代码取自 MySQL Workbench code base .

关于c++ - 从监听器中检索 antlr4 解析器中跳过的空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57044724/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com