c++ - 自定义 Lexer 的解析器问题-6ren

c++ - 自定义 Lexer 的解析器问题

转载作者：行者123 更新时间：2023-11-28 07:54:12

25

4

我正在寻求有关自定义构建的 Lexer 类并使用它来解析输入的帮助。我们的教授为我们的项目提供了一些框架代码，我们必须使用它。我的问题是，我们需要能够一次调用多个函数来对表进行排序和合并/排序单独表的列。例如，我们的输入类似于:

display <'file_name> sortedby <'column2>

其中 'display' 和 'sortedby' 是某种关键字，column2 将按数字或字母顺序排序 - 取决于内容。

我们得到了用于排序的算法，我当前的问题不是算法的实现，而是能够让我们的 Lexer/Parser 读取多个输入。目前，我只能让“显示”位起作用。更多的只是吐出一条错误信息。

我查看了代码，尝试更改一些逻辑 - 将语句从 true 切换为 false，交换 &&'s 和 ||'s，甚至尝试了一些 if-else 语句但没有成功。

我真的需要一些建议!我们以原始格式提供的一些代码:

词法分析器.h:

#ifndef _LEXER_H
#define _LEXER_H
#include <string>

enum token_types_t { 
IDENT,  // a sequence of alphanumeric characters and _, starting with alpha
TAG, // sequence of characters between < >, no escape
ENDTOK, // end of string/file, no more token
ERRTOK  // unrecognized token
};

struct Token {
token_types_t type;
std::string value;
// constructor for Token
Token(token_types_t tt=ENDTOK, std::string val="") : type(tt), value(val) {}
};

class Lexer {
public:
// constructor
Lexer(std::string str="") : input_str(str), cur_pos(0), in_err(false), 
    separators(" \t\n\r") { }

//modifiers 
void set_input(std::string); // set a new input, 
void restart();              // move cursor to the beginning, restart

Token next_token();    // returns the next token
bool has_more_token(); // are there more token(s)?

private:
std::string input_str;  // the input string to be scanned
size_t      cur_pos;    // current position in the input string
bool        in_err;     // are we in the error state?
std::string separators; // set of separators; *not* the best option!
};
#endif

词法分析器.cpp:

#include "Lexer.h"
#include <iostream>
using namespace std;

Token Lexer::next_token() {
Token ret;
size_t last;

if (in_err) {
    ret.type = ERRTOK;
    ret.value = "";
    return ret;
}

// if not in error state, the default token is the ENDTOK
ret.type = ENDTOK;
ret.value = "";

if (has_more_token()) {
    last = cur_pos; // input_str[last] is a non-space char
    if (input_str[cur_pos] == '<') {
        cur_pos++;
        while (cur_pos < input_str.length() && input_str[cur_pos] != '>')
            cur_pos++;
        if (cur_pos < input_str.length()) {
            ret.type = TAG;
            ret.value = input_str.substr(last+1, cur_pos-last-1);
            cur_pos++; // move past the closing "
        } else {
            in_err = true;
            ret.type = ERRTOK;
            ret.value = "";
        }
    } else {
        while (cur_pos < input_str.length() &&
               separators.find(input_str[cur_pos]) == string::npos &&
               input_str[cur_pos] != '<') {
            cur_pos++;
        }
        ret.type  = IDENT;
        ret.value = input_str.substr(last, cur_pos-last);
    }
}
return ret;
}

void Lexer::set_input(string str) {
input_str = str;
restart();
}

bool Lexer::has_more_token() {
while (cur_pos < input_str.length() && 
       separators.find(input_str[cur_pos]) != string::npos) {
    cur_pos++;
}
return (cur_pos < input_str.length());
}

void Lexer::restart() {
cur_pos = 0;
in_err = false;
}

我们的解析器(更大的 .cpp 文件的一部分):

bool parse_input(Lexer lexer, string& file_name) {    
Token file_name_tok;

if (!lexer.has_more_token() || 
    (file_name_tok = lexer.next_token()).type != TAG)
    return false;

if  (lexer.has_more_token())
    return false;

file_name = file_name_tok.value;
return true;
}

显示函数(与解析器相同的 .cpp 文件的一部分):

void display(Lexer cmd_lexer) {
string file_name, line;

if (!parse_input(cmd_lexer, file_name)) {
    error_return("Syntax error: display <filename>");
    return;
}

ifstream ifs(file_name.c_str());
string error_msg;
if (ifs) {
       if (!is_well_formed(ifs, error_msg)) {
        error_return(error_msg);
    } else {
            ifs.clear();           
        ifs.seekg(0, ios::beg); 
        print_well_formed_file(ifs);
    }
    while (ifs.good()) {
  getline (ifs, line);
  cout << line << endl;
}

} else {
    error_return("Can't open " + file_name + " for reading");
}
ifs.close();
}

最佳答案

根据对我的评论的回答，这些是我解决问题的方法:

如果 display 命令应该读取源文件并解析它，您可以通过堆栈实现它。每当找到并解析 display 指令时，您都会将一个新的词法分析器实例压入堆栈。将堆栈顶部用于“当前”词法分析器。
如果 display 命令应该读取和执行一些与实际解析无关的文件操作，那么考虑以固定形式的中间形式存储指令，并在完成解析后“执行”这个中间格式。几乎所有现代脚本语言都是这样做的。

关于c++ - 自定义 Lexer 的解析器问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13106660/

25

4

0

文章推荐： javascript - 在 for 循环 XMLHttpRequest 结束时执行回调

文章推荐： css - 无法在 Twitter Bootstrap 中垂直居中图像旁边的文本

文章推荐： javascript - 动态添加 dropzone.js div 元素到表单

文章推荐： arrays - 从 Swift 数组中获取特定值

java - 此 ANTLR 4 Lexer 文件中的 "parser rule ' Channels' not allowed in lexer"错误是什么？
我正在尝试从 https://github.com/antlr/grammars-v4/tree/master/mysql 处的 MySQL .g4 文件构建解析器和词法分析器。。我收到错误“词法分
lexer - 如何定义可以在ANTLR4中以多种词法模式出现的标记？
我正在学习 ANTLR4 并尝试使用词汇模式。如何让相同的标记出现在多个词法模式中？作为一个非常简单的例子，假设我的语法有两种模式，我想在这两种模式中匹配空格和行尾，例如我怎么能不以 WS_MODE1
flex-lexer - 奇怪的弹性行为
我有一个扫描仪dice notation 使用以下扫描器 %option debug %option noyywrap %option yylineno %{ #include #includ
match - 如何在flex(lexer)中启用最短匹配规则？
默认情况下，flex使用最长匹配规则。有什么方法可以覆盖此行为以使其与最短序列匹配？谢谢最佳答案 Flex手册中的This page表示它没有任何非贪婪的运算符，因为它是扫描程序而不是解析程序，
flex-lexer - Flex yylineno设置为1
我正在为tcpdump日志编写一个简单的解析器，能否请您告诉我为什么我无法获得正确的行号？ %{ char str[80]; %} %option yylineno ... %% ^{HOURS}:{
flex-lexer - 在flex/lex中难以获得c样式的注释
我想在flex中制定一条规则，以使用像/ * * /这样的c样式注释我有以下 c_comment "/*"[\n.]*"*/" 但是它永远不会匹配。知道为什么吗？如果您需要更多我的代码，请告诉我，我
parsing - 解析数字和字符串是 Lexer 的工作吗？
解析数字和字符串是词法分析器的工作吗？考虑到我在询问词法分析器是否应该解析输入这一事实，这可能听起来也可能不愚蠢。但是，我不确定这实际上是词法分析器的工作还是解析器的工作，因为为了正确地进行词法分析
lexer - 最简单的嵌套 block 解析器
我想为嵌套 block 语法编写一个简单解析器，只是分层纯文本。例如: Some regular text. This is outputted as-is, foo{but THIS is ins
c - 全局变量未在 Lexer 函数中读取
正在使用编译器，我想打印出符号表。我有一个节点结构，我需要访问全局变量“lineCount”。当我尝试在 idPush 函数中打印它时，出现段错误。我的目标是将节点放置在数组中或链接在一起，然后打印表
c++ - 自定义 Lexer 的解析器问题
我正在寻求有关自定义构建的 Lexer 类并使用它来解析输入的帮助。我们的教授为我们的项目提供了一些框架代码，我们必须使用它。我的问题是，我们需要能够一次调用多个函数来对表进行排序和合并/排序单独表的
c# - 穷人的 C# "lexer"
我正在尝试用 C# 编写一个非常简单的解析器。我需要一个词法分析器——它可以让我将正则表达式与标记相关联，因此它会读取正则表达式并返回符号。似乎我应该能够使用 Regex 来完成实际的繁重工作，但
具有多种模式的 Antlr 4 Lexer 无法正确标记
我正在尝试使用 Antlr 4.7 创建具有多种模式的词法分析器。我的词法分析器目前是: ACTIONONLY : 'AO'; BELIEFS : ':Initial Beliefs:' ->
匹配相似字符串的 Antlr lexer 标记，如果贪婪的词法分析器出错了怎么办？
似乎有时 Antlr 词法分析器在标记字符流时对使用哪个规则做出了错误的选择......我试图弄清楚如何帮助 Antlr 做出显而易见的正确选择。我想像这样解析文本: d/dt(x)=a a=d/dt
flex-lexer - 使用 flex 获取组？
这是八进制值的示例规则。我不想使用 YYTEXT，而是在结尾处获取值、字母和 # (8,...64)。我怎样才能得到它们？我猜 printf("%s", $1) 来看看我是否能得到这个值，但 lex
parsing - ANTLR语法: parser- and lexer literals
这种语法之间有什么区别: ... if_statement : 'if' condition 'then' statement 'else' statement 'end_if'; ... 和这个:
flex-lexer - 替换 flex 中的转义字符
我使用 flex 在扫描仪上工作，将 \" 替换为 "；和 \\ 与 \。所以我的代码是这样的 %% \\" \"; \\\ \\; 但是当我编译时，我收到一条错误消息，如 missin
flex-lexer - 法典 : How to override YY_BUF_SIZE
根据manual YY_BUF_SIZE 是 16K 我们需要覆盖它。但是，手册没有指定如何覆盖它，我也找不到任何命令行选项。有人可以指出如何更改此设置。在生成的源代码中，YY_BUF_SIZE定义如
flex-lexer - 未定义对 yywrap 的引用
我有一个简单的“语言”，我正在使用 Flex(词法分析器)，如下所示: /* Just like UNIX wc */ %{ int chars = 0; int words = 0; int lin
flex-lexer - 什么时候可以使用更新版本的 flex for windows？
我正在使用 flex (词法分析器，而不是 Adobe Flex)在一个项目上。但是，我也希望能够在 Windows 平台上进行编译，但是 Windows version的最新版本只有 2.5.4
Java 空指针异常 : Tokenizing Input for Lexer
当我在 CMD 提示符下运行 .jar 文件时，出现以下错误: C:\Users\Mikael\My Documents\NetBeansProjects\cs413CompilerProject\d

首页

博学

6Ren·AI

商城

c++ - 自定义 Lexer 的解析器问题