gpt4 book ai didi

c++ - 使用正则表达式匹配和提取数据

转载 作者:搜寻专家 更新时间:2023-10-31 00:11:33 25 4
gpt4 key购买 nike

问题:找到匹配的字符串并从匹配的字符串中提取数据。有许多带有关键字和数据的命令字符串。

命令示例:

  1. 问名字给我打电话
  2. 通知执行此操作的名称
  3. 请求的消息名称

关键词:Ask, Notify, Message, to, that。数据:

输入字符串:

  1. 让彼得给我打电话
  2. 通知珍娜我要离开
  3. 告诉家里我要迟到了

我的问题由两个问题组成1)查找匹配命令2)提取数据

这是我正在做的:我创建了多个正则表达式:"Ask[[:s:]][[:w:]]+[[:s:]]to[[:s:]][[:w:]]+"或 "Ask([^\t\n]+?)到([^\t\n]+?)""Notify[[:s:]][[:w:]]+[[:s:]]that[[:s:]][[:w:]]+"或 "Notify([^\t\n]+?)那([^\t\n]+?)"

void searchExpression(const char *regString)
{
std::string str;
boost::regex callRegEx(regString, boost::regex_constants::icase);
boost::cmatch im;

while(true) {
std::cout << "Enter String: ";
getline(std::cin, str);
fprintf(stderr, "str %s regstring %s\n", str.c_str(), regString);

if(boost::regex_search(str.c_str(), im, callRegEx)) {
int num_var = im.size() + 1;
fprintf(stderr, "Matched num_var %d\n", num_var);
for(int j = 0; j <= num_var; j++) {
fprintf(stderr, "%d) Found %s\n",j, std::string(im[j]).c_str());
}
}
else {
fprintf(stderr, "Not Matched\n");
}
}
}

我能够找到匹配的字符串,但我无法提取数据。这是输出:

input_string: Ask peter to call Regex Ask[[:s:]][[:w:]]+[[:s:]]to[[:s:]][[:w:]]+
Matched num_var 2
0) Found Ask peter to call
1) Found
2) Found

我想从 Ask Peter to call 中提取 peter 和 call。

最佳答案

既然你真的想解析一个语法,你应该考虑使用 Boost 的解析器生成器。

您只需自上而下地编写整个内容:

auto sentence  = [](auto&& v, auto&& p) { 
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return confix(verb, particle) [ name ];
};

auto ask = sentence("ask", "to") >> lexeme[+char_];
auto notify = sentence("notify", "that") >> lexeme[+char_];
auto message = sentence("message", "that") >> lexeme[+char_];

auto command = ask | notify | message;

这是它的 Spirit X3 语法。将 lexeme 读作“保留整个单词”(不要忽略空格)。

在这里,“名称”被认为是任何符合预期的粒子¹

如果你只想返回匹配的原始字符串,这就足够了:

Live On Coliru

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>

namespace x3 = boost::spirit::x3;

namespace commands {
namespace grammar {
using namespace x3;

auto sentence = [](auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return confix(verb, particle) [ name ];
};

auto ask = sentence("ask", "to") >> lexeme[+char_];
auto notify = sentence("notify", "that") >> lexeme[+char_];
auto message = sentence("message", "that") >> lexeme[+char_];

auto command = ask | notify | message;

auto parser = raw [ skip(space) [ command ] ];
}
}

int main() {
for (std::string const input : {
"Ask peter to call me",
"Notify Jenna that I am going to be away",
"Message home that I am running late",
})
{
std::string matched;

if (parse(input.begin(), input.end(), commands::grammar::parser, matched))
std::cout << "Matched: '" << matched << "'\n";
else
std::cout << "No match in '" << input << "'\n";
}

}

打印:

Matched: 'Ask peter to call me'
Matched: 'Notify Jenna that I am going to be away'
Matched: 'Message home that I am running late'

奖金

当然,您实际上想要提取相关信息。

这是我的做法。让我们解析成一个结构:

struct Command {
enum class Type { ask, message, notify } type;
std::string name;
std::string message;
};

让我们把 main() 写成:

commands::Command cmd;

if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
else
std::cout << "No match in '" << input << "'\n";

Live On Coliru

#include <iostream>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>

namespace x3 = boost::spirit::x3;

namespace commands {

struct Command {
enum class Type { ask, message, notify } type;
std::string name;
std::string message;

friend std::ostream& operator<<(std::ostream& os, Type t) { return os << static_cast<int>(t); } // TODO
};

}

BOOST_FUSION_ADAPT_STRUCT(commands::Command, type, name, message)

namespace commands {

namespace grammar {
using namespace x3;

auto sentence = [](auto type, auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return attr(type) >> confix(verb, particle) [ name ];
};

using Type = Command::Type;
auto ask = sentence(Type::ask, "ask", "to") >> lexeme[+char_];
auto notify = sentence(Type::notify, "notify", "that") >> lexeme[+char_];
auto message = sentence(Type::message, "message", "that") >> lexeme[+char_];

auto command // = rule<struct command, Command> { }
= ask | notify | message;

auto parser = skip(space) [ command ];
}
}

int main() {
for (std::string const input : {
"Ask peter to call me",
"Notify Jenna that I am going to be away",
"Message home that I am running late",
})
{
commands::Command cmd;

if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
else
std::cout << "No match in '" << input << "'\n";
}

}

打印

Matched: 0|peter|call me
Matched: 2|Jenna|I am going to be away
Matched: 1|home|I am running late

¹ 我不是英语语言学家,所以我不知道这是否是正确的语法术语 :)

关于c++ - 使用正则表达式匹配和提取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33711296/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com