gpt4 book ai didi

c++ - 提升精神 : slow parsing optimization

转载 作者:搜寻专家 更新时间:2023-10-31 02:20:14 25 4
gpt4 key购买 nike

我是 Spirit 和 Boost 的新手。我正在尝试解析 VRML 文件的一部分,如下所示:

point
[
#coordinates written in meters.
-3.425386e-001 -1.681608e-001 0.000000e+000,
-3.425386e-001 -1.642545e-001 0.000000e+000,
-3.425386e-001 -1.603483e-001 0.000000e+000,

#开头的注释是可选的。

我写了一个语法,工作正常,但解析过程花费了很长时间。我想优化它运行得更快。我的代码如下所示:

struct Point
{
double a;
double b;
double c;

Point() : a(0.0), b(0.0), c(0.0){}
};

BOOST_FUSION_ADAPT_STRUCT
(
Point,
(double, a)
(double, b)
(double, c)
)

namespace qi = boost::spirit::qi;
namespace repo = boost::spirit::repository;

template <typename Iterator>
struct PointParser :
public qi::grammar<Iterator, std::vector<Point>(), qi::space_type>
{
PointParser() : PointParser::base_type(start, "PointGrammar")
{
singlePoint = qi::double_>>qi::double_>>qi::double_>>*qi::lit(",");
comment = qi::lit("#")>>*(qi::char_("a-zA-Z.") - qi::eol);
prefix = repo::seek[qi::lexeme[qi::skip[qi::lit("point")>>qi::lit("[")>>*comment]]];
start %= prefix>>qi::repeat[singlePoint];

//BOOST_SPIRIT_DEBUG_NODES((prefix)(comment)(singlePoint)(start));
}

qi::rule<Iterator, Point(), qi::space_type> singlePoint;
qi::rule<Iterator, qi::space_type> comment;
qi::rule<Iterator, qi::space_type> prefix;
qi::rule<Iterator, std::vector<Point>(), qi::space_type> start;
};

我打算解析的部分位于输入文本的中间,因此我需要跳过文本部分才能到达它。我使用 repo::seek 实现了它。这是最好的方法吗?

我按以下方式运行解析器:

std::vector<Point> points;
typedef PointParser<std::string::const_iterator> pointParser;
pointParser g2;

auto start = ch::high_resolution_clock::now();
bool r = phrase_parse(Data.begin(), Data.end(), g2, qi::space, points);
auto end = ch::high_resolution_clock::now();

auto duration = ch::duration_cast<boost::chrono::milliseconds>(end - start).count();

要解析输入文本中的大约 80k 个条目,大约需要 2.5 秒,这对我的需求来说相当慢。我的问题是有没有办法以更优化的方式编写解析规则以使其(更快)更快?我如何总体上改进此实现?

我是 Spirit 的新手,所以非常感谢一些解释。

最佳答案

我已将您的语法连接到 Nonius 基准测试中,并生成了约 85k 行的均匀随机输入数据(下载:http://stackoverflow-sehe.s3.amazonaws.com/input.txt,7.4 MB)。

  • 您是否在发布版本中衡量时间?
  • 您使用的是慢速文件输入吗?

在预先读取文件时,我总是得到 ~36ms 的时间来解析整个文件。

clock resolution: mean is 17.616 ns (40960002 iterations)

benchmarking sample
collecting 100 samples, 1 iterations each, in estimated 3.82932 s
mean: 36.0971 ms, lb 35.9127 ms, ub 36.4456 ms, ci 0.95
std dev: 1252.71 μs, lb 762.716 μs, ub 2.003 ms, ci 0.95
found 6 outliers among 100 samples (6%)
variance is moderately inflated by outliers

代码:见下文。


注意事项:

  • 您似乎对使用 skippers 和 seek 有矛盾。我建议您简化 prefix:

    comment     = '#' >> *(qi::char_ - qi::eol);

    prefix = repo::seek[
    qi::lit("point") >> '[' >> *comment
    ];

    prefix 将使用空格跳过器,并忽略任何匹配的属性(因为规则声明类型)。通过从规则声明中删除船长,使 comment 隐式地成为一个 lexeme:

        // implicit lexeme:
    qi::rule<Iterator> comment;

    Note See Boost spirit skipper issues for more background information.

Live On Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>

namespace qi = boost::spirit::qi;
namespace repo = boost::spirit::repository;

struct Point { double a = 0, b = 0, c = 0; };

BOOST_FUSION_ADAPT_STRUCT(Point, a, b, c)

template <typename Iterator>
struct PointParser : public qi::grammar<Iterator, std::vector<Point>(), qi::space_type>
{
PointParser() : PointParser::base_type(start, "PointGrammar")
{
singlePoint = qi::double_ >> qi::double_ >> qi::double_ >> *qi::lit(',');

comment = '#' >> *(qi::char_ - qi::eol);

prefix = repo::seek[
qi::lit("point") >> '[' >> *comment
];

//prefix = repo::seek[qi::lexeme[qi::skip[qi::lit("point")>>qi::lit("[")>>*comment]]];

start %= prefix >> *singlePoint;

//BOOST_SPIRIT_DEBUG_NODES((prefix)(comment)(singlePoint)(start));
}

private:
qi::rule<Iterator, Point(), qi::space_type> singlePoint;
qi::rule<Iterator, std::vector<Point>(), qi::space_type> start;
qi::rule<Iterator, qi::space_type> prefix;
// implicit lexeme:
qi::rule<Iterator> comment;
};

#include <nonius/benchmark.h++>
#include <nonius/main.h++>
#include <boost/iostreams/device/mapped_file.hpp>

static boost::iostreams::mapped_file_source src("input.txt");

NONIUS_BENCHMARK("sample", [](nonius::chronometer cm) {
std::vector<Point> points;

using It = char const*;
PointParser<It> g2;

cm.measure([&](int) {
It f = src.begin(), l = src.end();
return phrase_parse(f, l, g2, qi::space, points);
bool ok = phrase_parse(f, l, g2, qi::space, points);
if (ok)
std::cout << "Parsed " << points.size() << " points\n";
else
std::cout << "Parsed failed\n";

if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,std::min(f+30, l)) << "'\n";

assert(ok);
});
})

图表:

enter image description here

另一个运行输出,实时:

关于c++ - 提升精神 : slow parsing optimization,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32968409/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com