gpt4 book ai didi

c++ - 如何使用 Boost.Xpressive 正确解析 mustache ?

转载 作者:搜寻专家 更新时间:2023-10-31 01:44:03 25 4
gpt4 key购买 nike

我尝试用出色的 Boost.XPressive 编写一个 mustache 解析器来自杰出的 Eric Niebler。但由于这是我的第一个解析器,我不熟悉编译器编写者的“正常”方法和行话,经过几天的试错后感觉有点迷茫。所以我来到这里,希望有人能告诉我我的 n00bish 方式的愚蠢 ;)

这是带有我要提取的 mustache 模板的 HTML 代码 (http://mustache.github.io/): Now <bold>is the {{#time}}gugus {{zeit}} oder nicht{{/time}} <i>for all good men</i> to come to the {007} aid of their</bold> {{country}}. Result: {{#Res1}}Nullum <b>est</b> mundi{{/Res1}}

我有以下问题无法单独解决:

  • 我写的解析器没有打印出任何东西,也没有在编译时发出警告。我之前设法让它打印出部分 mustache 代码,但从未正确打印出全部。
  • 我不知道如何遍历所有代码来找到所有出现的地方,然后又像使用 smatch what; 一样访问它们多变的。该文档仅显示如何使用“什么”查找第一次出现或如何使用“迭代器”输出所有出现的事件。
    • 实际上我需要两者的结合。因为一旦找到某些东西,我需要质疑标签名称和标签之间的内容(“什么”会提供但“迭代器”不允许) - 并采取相应的行动。我想我可以使用“ Action ”,但如何使用?
    • 我认为应该可以一次性完成标签查找和“标签之间的内容”,对吧?或者我是否需要为此解析 2 次 - 如果需要,如何解析?
  • 是否可以像我一样解析左括号和右括号,因为总是 2 个括号?或者我应该按顺序做还是使用 repeat<2,2>('{')
  • 对于 keep() 的情况我还是有点不确定和 by_ref()是必要的,最好不要使用它们。
  • 我找不到迭代器第 4 个参数的其他选项 sregex_token_iterator cur( str.begin(), str.end(), html, -1 );这里的 -1 输出除匹配标签之外的所有内容。
  • 我的解析器字符串是否正确地找到了嵌套的 mustache 标签?
#include <boost/xpressive/xpressive_static.hpp>
#include <boost/xpressive/match_results.hpp>
typedef std::string::const_iterator It;
using namespace boost::xpressive;

std::string str = "Now <bold>is the {{#time}}gugus {{zeit}} oder nicht{{/time}} <i>for all good men</i> to come to the {007} aid of their</bold> {{country}}. Result: {{#Res1}}Nullum <b>est</b> mundi{{/Res1}}";
// Parser setup --------------------------------------------------------
mark_tag mtag (1), cond_mtag (2), user_str (3);
sregex brackets = "{{"
>> keep ( mtag = repeat<1, 20> (_w) )
>> "}}"
;

sregex cond_brackets = "{{#"
>> keep (cond_mtag = repeat<1, 20> (_w) )
>> "}}"
>> * (
keep (user_str = + (*_s >> +alnum >> *_s) ) |
by_ref (brackets) |
by_ref (cond_brackets)
)
>> "{{/"
>> cond_mtag
>> "}}"
;
sregex mexpression = *( by_ref (cond_brackets) | by_ref (brackets) );

// Looping + catching the results --------------------------------------
smatch what2;
std::cout << "\nregex_search:\n" << str << '\n';
It strBegin = str.begin(), strEnd = str.end();
int ic = 0;

do
{
if ( !regex_search ( strBegin, strEnd, what2, mexpression ) )
{
std::cout << "\t>> Breakout of this life...! Exit after " << ic << " loop(s)." << std::endl;
break;
}
else
{
std::cout << "**Loop Nr: " << ic << '\n';
std::cout << "\twhat2[0] " << what2[0] << '\n'; // whole match
std::cout << "\twhat2[mtag] " << what2[mtag] << '\n';
std::cout << "\twhat2[cond_mtag] " << what2[cond_mtag] << '\n';
std::cout << "\twhat2[user_str] " << what2[user_str] << '\n';
// display the nested results
std::for_each (
what2.nested_results().begin(),
what2.nested_results().end(),
output_nested_results() // <--identical function from E.Nieblers documentation
);

strBegin = what2[0].second;
}
++ic;
}
while (ic < 6 || strBegin != str.end() );

最佳答案

Boost Spirit 建立在 Proto 之上(由同一个英雄 Eric Niebler 开发!),所以如果我坚持我的个人传统并在 Boost Spirit 中展示一个实现,希望您不要介意。

我发现仅从显示的代码中很难看出您想要实现的目标。因此我直接去了 mustache docs 并为以下 AST 实现了解析器:

namespace mustache {

// any atom refers directly to source iterators for efficiency
using boost::string_ref;
template <typename Kind> struct atom {
string_ref value;

atom() { }
atom(string_ref const& value) : value(value) { }
};

// the atoms
using verbatim = atom<struct verbatim_tag>;
using variable = atom<struct variable_tag>;
using partial = atom<struct partial_tag>;

// the template elements (any atom or a section)
struct section;

using melement = boost::variant<
verbatim,
variable,
partial,
boost::recursive_wrapper<section>
// TODO comments and set-separators
>;

// the template: sequences of elements
using sequence = std::vector<melement>;

// section: recursively define to contain a template sequence
struct section {
bool sense; // positive or negative
string_ref control;
sequence content;
};
}

如您所见,我添加了对negated 部分以及部分 模板(即扩展到模板以动态扩展的变量)的支持。

以下是作品:

sequence     = *element;
element =
!(lit("{{") >> '/') >> // section-end ends the current sequence
(partial | section | variable | verbatim);

reference = +(graph - "}}");

partial = qi::lit("{{") >> "> " >> reference >> "}}";

sense = ('#' > attr(true))
| ('^' > attr(false));

section %= "{{" >> sense >> reference [ section_id = phx::bind(&boost::string_ref::to_string, _1) ] >> "}}"
>> sequence // contents
> ("{{" >> ('/' >> lit(section_id)) >> "}}");

variable = "{{" >> reference >> "}}";

verbatim = +(char_ - "{{");

唯一漂亮的是使用 qi::local<>名为 section_id检查一个部分的结束标记是否与当前部分的开始标记相匹配。

qi::rule<Iterator, mustache::sequence()> sequence;
qi::rule<Iterator, mustache::melement()> element;
qi::rule<Iterator, mustache::partial()> partial;
qi::rule<Iterator, mustache::section(), qi::locals<std::string> > section;
qi::rule<Iterator, bool()> sense; // postive or negative
qi::rule<Iterator, mustache::variable()> variable;
qi::rule<Iterator, mustache::verbatim()> verbatim;

我基于输入数据将保留的假设进行优化,因此我们不需要复制实际数据。这应该可以避免 99% 的分配需求。我用了boost::string_ref在这里实现这一点,我认为可以公平地说,这引入了唯一的复杂性(请参阅下面的完整代码)。

qi::rule<Iterator, boost::string_ref()>   reference;

现在我们准备好让我们的解析器试一试查看 Live On Coliru

int main()
{
std::cout << std::unitbuf;
std::string input = "<ul>{{#time}}\n\t<li>{{> partial}}</li>{{/time}}</ul>\n "
"<i>for all good men</i> to come to the {007} aid of "
"their</bold> {{country}}. Result: {{^Res2}}(absent){{/Res2}}{{#Res2}}{{Res2}}{{/Res2}}"
;
// Parser setup --------------------------------------------------------
typedef std::string::const_iterator It;
static const mustache_grammar<It> p;

It first = input.begin(), last = input.end();

try {
mustache::sequence parsed_template;
if (qi::parse(first, last, p, parsed_template))
std::cout << "Parse success\n";
else
std::cout << "Parse failed\n";

if (first != last)
std::cout << "Remaing unparsed input: '" << std::string(first, last) << "'\n";

std::cout << "Input: " << input << "\n";
std::cout << "Dump: ";
Dumping::dumper()(std::cout, parsed_template) << "\n";
} catch(qi::expectation_failure<It> const& e)
{
std::cout << "Unexpected: '" << std::string(e.first, e.last) << "'\n";
}
}

Dumping::dumper简单地从解析的 AST 中打印出 mustache 模板。你可能想知道如何 dumper已实现:

struct dumper : boost::static_visitor<std::ostream&>
{
std::ostream& operator()(std::ostream& os, mustache::sequence const& v) const {
for(auto& element : v)
boost::apply_visitor(std::bind(dumper(), std::ref(os), std::placeholders::_1), element);
return os;
}
std::ostream& operator()(std::ostream& os, mustache::verbatim const& v) const {
return os << v.value;
}
std::ostream& operator()(std::ostream& os, mustache::variable const& v) const {
return os << "{{" << v.value << "}}";
}
std::ostream& operator()(std::ostream& os, mustache::partial const& v) const {
return os << "{{> " << v.value << "}}";
}
std::ostream& operator()(std::ostream& os, mustache::section const& v) const {
os << "{{" << (v.sense?'#':'^') << v.control << "}}";
(*this)(os, v.content);
return os << "{{/" << v.control << "}}";
}
};

没有什么太复杂的。 Boost Variant 确实提供了一种声明式编程风格。为了更彻底地说明这一点,让我们添加基于上下文对象的扩展!

我不打算为此实现 JSON,所以让我们假设一个上下文值模型,例如:

struct Nil { };

using Value = boost::make_recursive_variant<
Nil,
double,
std::string,
std::map<std::string, boost::recursive_variant_>,
std::vector<boost::recursive_variant_>
>::type;

using Dict = std::map<std::string, Value>;
using Array = std::vector<Value>;

现在我们对 mustache::melement 使用二进制访问和这个上下文Value变种。这比转储多了一些代码,但让我们先看看使用现场:

using namespace ContextExpander;
expander engine;

Value const ctx = Dict {
{ "time", Array {
Dict { { "partial", "gugus {{zeit}} (a.k.a. <u>{{title}}</u>)"}, { "title", "noon" }, { "zeit", "12:00" } },
Dict { { "partial", "gugus {{zeit}} (a.k.a. <u>{{title}}</u>)"}, { "title", "evening" }, { "zeit", "19:30" } },
Dict { { "partial", "gugus <u>{{title}}</u> (expected at around {{zeit}})"}, { "title", "dawn" }, { "zeit", "06:00" } },
} },
{ "country", "ESP" },
{ "Res3", "unused" }
};

engine(std::cout, ctx, parsed_template);

这会打印(再次查看 Live On Coliru):

Evaluation: <ul>
<li>gugus 12:00 (a.k.a. <u>noon</u>)</li>
<li>gugus 19:30 (a.k.a. <u>evening</u>)</li>
<li>gugus <u>dawn</u> (expected at around 06:00)</li></ul>
<i>for all good men</i> to come to the {007} aid of their</bold> ESP. Result: (absent)

完整代码 list

供引用:

//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/utility/string_ref.hpp>
#include <functional>
#include <map>

namespace mustache {

// any atom refers directly to source iterators for efficiency
using boost::string_ref;
template <typename Kind> struct atom {
string_ref value;

atom() { }
atom(string_ref const& value) : value(value) { }

friend std::ostream& operator<<(std::ostream& os, atom const& v) { return os << typeid(v).name() << "[" << v.value << "]"; }
};

// the atoms
using verbatim = atom<struct verbatim_tag>;
using variable = atom<struct variable_tag>;
using partial = atom<struct partial_tag>;

// the template elements (any atom or a section)
struct section;

using melement = boost::variant<
verbatim,
variable,
partial, // TODO comments and set-separators
boost::recursive_wrapper<section>
>;

// the template: sequences of elements
using sequence = std::vector<melement>;

// section: recursively define to contain a template sequence
struct section {
bool sense; // positive or negative
string_ref control;
sequence content;
};
}

BOOST_FUSION_ADAPT_STRUCT(mustache::section, (bool, sense)(boost::string_ref, control)(mustache::sequence, content))

namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;

template <typename Iterator>
struct mustache_grammar : qi::grammar<Iterator, mustache::sequence()>
{
mustache_grammar() : mustache_grammar::base_type(sequence)
{
using namespace qi;
static const _a_type section_id; // local
using boost::phoenix::construct;
using boost::phoenix::begin;
using boost::phoenix::size;

sequence = *element;
element =
!(lit("{{") >> '/') >> // section-end ends the current sequence
(partial | section | variable | verbatim);

reference = raw [ lexeme [ +(graph - "}}") ] ]
[ _val = construct<boost::string_ref>(&*begin(_1), size(_1)) ];

partial = qi::lit("{{") >> "> " >> reference >> "}}";

sense = ('#' > attr(true))
| ('^' > attr(false));

section %= "{{" >> sense >> reference [ section_id = phx::bind(&boost::string_ref::to_string, _1) ] >> "}}"
>> sequence // contents
> ("{{" >> ('/' >> lexeme [ lit(section_id) ]) >> "}}");

variable = "{{" >> reference >> "}}";

verbatim = raw [ lexeme [ +(char_ - "{{") ] ]
[ _val = construct<boost::string_ref>(&*begin(_1), size(_1)) ];

BOOST_SPIRIT_DEBUG_NODES(
(sequence)(element)(partial)(variable)(section)(verbatim)
(reference)(sense)
)
}
private:
qi::rule<Iterator, mustache::sequence()> sequence;
qi::rule<Iterator, mustache::melement()> element;
qi::rule<Iterator, mustache::partial()> partial;
qi::rule<Iterator, mustache::section(), qi::locals<std::string> > section;
qi::rule<Iterator, bool()> sense; // postive or negative
qi::rule<Iterator, mustache::variable()> variable;
qi::rule<Iterator, mustache::verbatim()> verbatim;
qi::rule<Iterator, boost::string_ref()> reference;
};

namespace Dumping {
struct dumper : boost::static_visitor<std::ostream&>
{
std::ostream& operator()(std::ostream& os, mustache::sequence const& v) const {
for(auto& element : v)
boost::apply_visitor(std::bind(dumper(), std::ref(os), std::placeholders::_1), element);
return os;
}
std::ostream& operator()(std::ostream& os, mustache::verbatim const& v) const {
return os << v.value;
}
std::ostream& operator()(std::ostream& os, mustache::variable const& v) const {
return os << "{{" << v.value << "}}";
}
std::ostream& operator()(std::ostream& os, mustache::partial const& v) const {
return os << "{{> " << v.value << "}}";
}
std::ostream& operator()(std::ostream& os, mustache::section const& v) const {
os << "{{" << (v.sense?'#':'^') << v.control << "}}";
(*this)(os, v.content);
return os << "{{/" << v.control << "}}";
}
};
}

namespace ContextExpander {

struct Nil { };

using Value = boost::make_recursive_variant<
Nil,
double,
std::string,
std::map<std::string, boost::recursive_variant_>,
std::vector<boost::recursive_variant_>
>::type;

using Dict = std::map<std::string, Value>;
using Array = std::vector<Value>;

static inline std::ostream& operator<<(std::ostream& os, Nil const&) { return os << "#NIL#"; }
static inline std::ostream& operator<<(std::ostream& os, Dict const& v) { return os << "#DICT(" << v.size() << ")#"; }
static inline std::ostream& operator<<(std::ostream& os, Array const& v) { return os << "#ARRAY(" << v.size() << ")#"; }

struct expander : boost::static_visitor<std::ostream&>
{
std::ostream& operator()(std::ostream& os, Value const& ctx, mustache::sequence const& v) const {
for(auto& element : v)
boost::apply_visitor(std::bind(expander(), std::ref(os), std::placeholders::_1, std::placeholders::_2), ctx, element);
return os;
}

template <typename Ctx>
std::ostream& operator()(std::ostream& os, Ctx const&/*ignored*/, mustache::verbatim const& v) const {
return os << v.value;
}

std::ostream& operator()(std::ostream& os, Dict const& ctx, mustache::variable const& v) const {
auto it = ctx.find(v.value.to_string());
if (it != ctx.end())
os << it->second;
return os;
}

template <typename Ctx>
std::ostream& operator()(std::ostream& os, Ctx const&, mustache::variable const&) const {
return os;
}

std::ostream& operator()(std::ostream& os, Dict const& ctx, mustache::partial const& v) const {
auto it = ctx.find(v.value.to_string());
if (it != ctx.end())
{
static const mustache_grammar<std::string::const_iterator> p;

auto const& subtemplate = boost::get<std::string>(it->second);
std::string::const_iterator first = subtemplate.begin(), last = subtemplate.end();

mustache::sequence dynamic_template;
if (qi::parse(first, last, p, dynamic_template))
return (*this)(os, Value{ctx}, dynamic_template);
}
return os << "#ERROR#";
}

std::ostream& operator()(std::ostream& os, Dict const& ctx, mustache::section const& v) const {
auto it = ctx.find(v.control.to_string());
if (it != ctx.end())
boost::apply_visitor(std::bind(do_section(), std::ref(os), std::placeholders::_1, std::cref(v)), it->second);
else if (!v.sense)
(*this)(os, Value{/*Nil*/}, v.content);

return os;
}

template <typename Ctx, typename T>
std::ostream& operator()(std::ostream& os, Ctx const&/* ctx*/, T const&/* element*/) const {
return os << "[TBI:" << __PRETTY_FUNCTION__ << "]";
}

private:
struct do_section : boost::static_visitor<> {
void operator()(std::ostream& os, Array const& ctx, mustache::section const& v) const {
for(auto& item : ctx)
expander()(os, item, v.content);
}
template <typename Ctx>
void operator()(std::ostream& os, Ctx const& ctx, mustache::section const& v) const {
if (v.sense == truthiness(ctx))
expander()(os, Value(ctx), v.content);
}
private:
static bool truthiness(Nil) { return false; }
static bool truthiness(double d) { return 0. == d; }
template <typename T> static bool truthiness(T const& v) { return !v.empty(); }
};
};

}

int main()
{
std::cout << std::unitbuf;
std::string input = "<ul>{{#time}}\n\t<li>{{> partial}}</li>{{/time}}</ul>\n "
"<i>for all good men</i> to come to the {007} aid of "
"their</bold> {{country}}. Result: {{^Res2}}(absent){{/Res2}}{{#Res2}}{{Res2}}{{/Res2}}"
;
// Parser setup --------------------------------------------------------
typedef std::string::const_iterator It;
static const mustache_grammar<It> p;

It first = input.begin(), last = input.end();

try {
mustache::sequence parsed_template;
if (qi::parse(first, last, p, parsed_template))
{
std::cout << "Parse success\n";
} else
{
std::cout << "Parse failed\n";
}

if (first != last)
{
std::cout << "Remaing unparsed input: '" << std::string(first, last) << "'\n";
}

std::cout << "Input: " << input << "\n";
std::cout << "Dump: ";
Dumping::dumper()(std::cout, parsed_template) << "\n";

std::cout << "Evaluation: ";

{
using namespace ContextExpander;
expander engine;

Value const ctx = Dict {
{ "time", Array {
Dict { { "partial", "gugus {{zeit}} (a.k.a. <u>{{title}}</u>)"}, { "title", "noon" }, { "zeit", "12:00" } },
Dict { { "partial", "gugus {{zeit}} (a.k.a. <u>{{title}}</u>)"}, { "title", "evening" }, { "zeit", "19:30" } },
Dict { { "partial", "gugus <u>{{title}}</u> (expected at around {{zeit}})"}, { "title", "dawn" }, { "zeit", "06:00" } },
} },
{ "country", "ESP" },
{ "Res3", "unused" }
};

engine(std::cout, ctx, parsed_template);
}
} catch(qi::expectation_failure<It> const& e)
{
std::cout << "Unexpected: '" << std::string(e.first, e.last) << "'\n";
}
}

关于c++ - 如何使用 Boost.Xpressive 正确解析 mustache ?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24122557/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com