gpt4 book ai didi

c++ - 如何使用 boost::spirit 解析 csv

转载 作者:可可西里 更新时间:2023-11-01 18:17:17 26 4
gpt4 key购买 nike

我有这条 csv 行

std::string s = R"(1997,Ford,E350,"ac, abs, moon","some "rusty" parts",3000.00)";

我可以使用 boost::tokenizer 解析它:

typedef boost::tokenizer< boost::escaped_list_separator<char> , std::string::const_iterator, std::string> Tokenizer;
boost::escaped_list_separator<char> seps('\\', ',', '\"');
Tokenizer tok(s, seps);
for (auto i : tok)
{
std::cout << i << std::endl;
}

它做对了,除了 token “生锈”应该有双引号被剥离。

这是我尝试使用 boost::spirit

boost::spirit::classic::rule<> list_csv_item = !(boost::spirit::classic::confix_p('\"', *boost::spirit::classic::c_escape_ch_p, '\"') | boost::spirit::classic::longest_d[boost::spirit::classic::real_p | boost::spirit::classic::int_p]);
std::vector<std::string> vec_item;
std::vector<std::string> vec_list;
boost::spirit::classic::rule<> list_csv = boost::spirit::classic::list_p(list_csv_item[boost::spirit::classic::push_back_a(vec_item)],',')[boost::spirit::classic::push_back_a(vec_list)];
boost::spirit::classic::parse_info<> result = parse(s.c_str(), list_csv);
if (result.hit)
{
for (auto i : vec_item)
{
cout << i << endl;
}
}

问题:

  1. 不起作用,只打印第一个标记

  2. 为什么选择 boost::spirit::classic?找不到使用 Spirit V2 的示例

  3. 设置很残酷..但我可以接受

** 我真的很想使用 boost::spirit 因为它往往非常快

预期输出:

1997
Ford
E350
ac, abs, moon
some "rusty" parts

3000.00

最佳答案

For a background on parsing (optionally) quoted delimited fields, including different quoting characters (', "), see here:

For a very, very, very complete example complete with support for partially quoted values and a

splitInto(input, output, ' ');

method that takes 'arbitrary' output containers and delimiter expressions, see here:

解决您的确切问题,假设要么引用未引用的字段(没有部分引号字段值内),使用Spirit V2:

让我们采用可能可行的最简单的“抽象数据类型”:

using Column  = std::string;
using Columns = std::vector<Column>;
using CsvLine = Columns;
using CsvFile = std::vector<CsvLine>;

并且重复的双引号转义双引号语义(正如我在评论中指出的那样),你应该能够使用类似的东西:

static const char colsep = ',';

start = -line % eol;
line = column % colsep;
column = quoted | *~char_(colsep);
quoted = '"' >> *("\"\"" | ~char_('"')) >> '"';

下面完整的测试程序打印出来

[1997][Ford][E350][ac, abs, moon][rusty][3001.00]

(注意 BOOST_SPIRIT_DEBUG 定义以便于调试)。见<强>Live on Coliru

完整演示

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

using Column = std::string;
using Columns = std::vector<Column>;
using CsvLine = Columns;
using CsvFile = std::vector<CsvLine>;

template <typename It>
struct CsvGrammar : qi::grammar<It, CsvFile(), qi::blank_type>
{
CsvGrammar() : CsvGrammar::base_type(start)
{
using namespace qi;

static const char colsep = ',';

start = -line % eol;
line = column % colsep;
column = quoted | *~char_(colsep);
quoted = '"' >> *("\"\"" | ~char_('"')) >> '"';

BOOST_SPIRIT_DEBUG_NODES((start)(line)(column)(quoted));
}
private:
qi::rule<It, CsvFile(), qi::blank_type> start;
qi::rule<It, CsvLine(), qi::blank_type> line;
qi::rule<It, Column(), qi::blank_type> column;
qi::rule<It, std::string()> quoted;
};

int main()
{
const std::string s = R"(1997,Ford,E350,"ac, abs, moon","""rusty""",3001.00)";

auto f(begin(s)), l(end(s));
CsvGrammar<std::string::const_iterator> p;

CsvFile parsed;
bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);

if (ok)
{
for(auto& line : parsed) {
for(auto& col : line)
std::cout << '[' << col << ']';
std::cout << std::endl;
}
} else
{
std::cout << "Parse failed\n";
}

if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

关于c++ - 如何使用 boost::spirit 解析 csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18365463/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com