gpt4 book ai didi

c++ - 使用可变顺序的列 boost spirit 解析 CSV

转载 作者:太空狗 更新时间:2023-10-29 23:02:34 25 4
gpt4 key购买 nike

我正在尝试使用 boost spirit 解析 CSV 文件(带有标题行)。csv 不是固定格式。有时会有一些额外的列或列的顺序混合。我对标题名称众所周知的几列感兴趣。

例如我的 CSV 可能如下所示:

Name,Surname,AgeJohn,Doe,32

Or:

Age,Name32,John

I want to parse only the content of Name and Age (N.B. Age is integer type). At the moment i come out with a very ugly solution where Spirit parses the first line and creates a vector that contains an enum in the positions i'm interested into. And then i have to do the parsing of the terminal symbols by hand...

enum LineItems {
NAME, AGE, UNUSED
};

struct CsvLine {
string name;
int age;
};

using Column = std::string;
using CsvFile = std::vector<CsvLine>;

template<typename It>
struct CsvGrammar: qi::grammar<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> {
CsvGrammar() :
CsvGrammar::base_type(start) {
using namespace qi;

static const char colsep = ',';

start = qi::omit[header[qi::_a = qi::_1]] >> eol >> line(_a) % eol;
header = (lit("Name")[phx::push_back(phx::ref(qi::_val), LineItems::NAME)]
| lit("Age")[phx::push_back(phx::ref(qi::_val), LineItems::AGE)]
| column[phx::push_back(phx::ref(qi::_val), LineItems::UNUSED)]) % colsep;
line = (column % colsep)[phx::bind(&CsvGrammar<It>::convertFunc, this, qi::_1, qi::_r1,
qi::_val)];
column = quoted | *~char_(",\n");
quoted = '"' >> *("\"\"" | ~char_("\"\n")) >> '"';
}

void convertFunc(std::vector<string>& columns, std::vector<LineItems>& positions, CsvLine &csvLine) {
//terminal symbol parsing here, and assign to csvLine struct.
...
}
private:
qi::rule<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> start;
qi::rule<It, std::vector<LineItems>(), qi::blank_type> header;
qi::rule<It, CsvLine(std::vector<LineItems>), qi::blank_type> line;
qi::rule<It, Column(), qi::blank_type> column;
qi::rule<It, std::string()> quoted;
qi::rule<It, qi::blank_type> empty;

};

这是 full source .

如果 header 解析器可以准备一个 vector<rule<...>*> 会怎么样?而“行解析器”只是使用这个 vector 来解析自己?一种高级 nabialek trick (我一直在努力,但我做不到)。

或者有没有更好的方法来用 Spirit 解析这种 CSV?(任何帮助表示赞赏,在此先感谢您)

最佳答案

我会接受你的概念,

我认为它非常优雅(qi locals 甚至允许重入使用它)。

为了减少规则中的麻烦 ( Boost Spirit: "Semantic actions are evil"? ),您 可以将“转换函数”移至属性转换 自定义点 .

糟糕。正如所评论的那样,这太简单了。但是,您仍然可以大大减少粗糙度。通过两个简单的调整,语法如下:

item.add("Name", NAME)("Age", AGE);
start = omit[ header[_a=_1] ] >> eol >> line(_a) % eol;

header = (item | omit[column] >> attr(UNUSED)) % colsep;
line = (column % colsep) [convert];

column = quoted | *~char_(",\n");
quoted = '"' >> *("\"\"" | ~char_("\"\n")) >> '"';

调整:

  • 使用qi::symbols从标题映射到 LineItem
  • 使用直接访问上下文的原始语义操作 ( [convert] )(参见 boost spirit semantic action parameters ):

    struct final {
    using Ctx = typename decltype(line)::context_type;

    void operator()(Columns const& columns, Ctx &ctx, bool &pass) const {
    auto& csvLine = boost::fusion::at_c<0>(ctx.attributes);
    auto& positions = boost::fusion::at_c<1>(ctx.attributes);
    int i =0;

    for (LineItems position : positions) {
    switch (position) {
    case NAME: csvLine.name = columns[i]; break;
    case AGE: csvLine.age = atoi(columns[i].c_str()); break;
    default: break;
    }
    i++;
    }

    pass = true; // returning false fails the `line` rule
    }
    } convert;

可以说结果类似于做 auto convert = phx::bind(&CsvGrammar<It>::convertFunc, this, qi::_1, qi::_r1, qi::_val)但使用 auto使用 Proto/Phoenix/Spirit 表达式是出了名的容易出错(UB,因为悬空引用来自表达式模板的临时变量),所以我当然更喜欢上面显示的方式。

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <iostream>
#include <boost/fusion/include/at_c.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <vector>

namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;

using std::string;

enum LineItems { NAME, AGE, UNUSED };

struct CsvLine {
string name;
int age;
};

using Column = std::string;
using Columns = std::vector<Column>;
using CsvFile = std::vector<CsvLine>;

template<typename It>
struct CsvGrammar: qi::grammar<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> {
CsvGrammar() : CsvGrammar::base_type(start) {
using namespace qi;
static const char colsep = ',';

item.add("Name", NAME)("Age", AGE);
start = qi::omit[ header[_a=_1] ] >> eol >> line(_a) % eol;

header = (item | omit[column] >> attr(UNUSED)) % colsep;
line = (column % colsep) [convert];

column = quoted | *~char_(",\n");
quoted = '"' >> *("\"\"" | ~char_("\"\n")) >> '"';

BOOST_SPIRIT_DEBUG_NODES((header)(column)(quoted));
}

private:
qi::rule<It, std::vector<LineItems>(), qi::blank_type> header;
qi::rule<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> start;
qi::rule<It, CsvLine(std::vector<LineItems> const&), qi::blank_type> line;

qi::rule<It, Column(), qi::blank_type> column;
qi::rule<It, std::string()> quoted;
qi::rule<It, qi::blank_type> empty;

qi::symbols<char, LineItems> item;

struct final {
using Ctx = typename decltype(line)::context_type;

void operator()(Columns const& columns, Ctx &ctx, bool &pass) const {
auto& csvLine = boost::fusion::at_c<0>(ctx.attributes);
auto& positions = boost::fusion::at_c<1>(ctx.attributes);
int i =0;

for (LineItems position : positions) {
switch (position) {
case NAME: csvLine.name = columns[i]; break;
case AGE: csvLine.age = atoi(columns[i].c_str()); break;
default: break;
}
i++;
}

pass = true; // returning false fails the `line` rule
}
} convert;
};

int main() {
const std::string s = "Surname,Name,Age,\nJohn,Doe,32\nMark,Smith,43";

auto f(begin(s)), l(end(s));
CsvGrammar<std::string::const_iterator> p;

CsvFile parsed;
bool ok = qi::phrase_parse(f, l, p, qi::blank, parsed);

if (ok) {
for (CsvLine line : parsed) {
std::cout << '[' << line.name << ']' << '[' << line.age << ']';
std::cout << std::endl;
}
} else {
std::cout << "Parse failed\n";
}

if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
}

打印

[Doe][32]
[Smith][43]

关于c++ - 使用可变顺序的列 boost spirit 解析 CSV,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27967195/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com