c++ - 使用 boost::spirit 解析 Newick 语法-6ren

c++ - 使用 boost::spirit 解析 Newick 语法

转载作者：太空狗更新时间：2023-10-29 21:22:30

我正在尝试使用 boost::spirit 库解析 Newick 语法(定义为 here)。

我已经制作了自己的解析器，可以正确识别语法。这是代码:

#define BOOST_SPIRIT_DEBUG

#include <boost/spirit/include/qi.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <vector>

namespace parser
{
    struct ptree;

    typedef boost::variant<boost::recursive_wrapper<ptree>> ptree_recursive;
    struct ptree
    {
        std::vector<ptree_recursive> children;
        std::string name;
        double length;
    };

    /* Used to cast ptree_recursive into ptree. */
    class ptree_visitor : public boost::static_visitor<ptree>
    {
    public:
        ptree operator() (ptree tree) const
        {
            return tree;
        }
    };
}

BOOST_FUSION_ADAPT_STRUCT(
    parser::ptree,
    (std::vector<parser::ptree_recursive>, children)
    (std::string, name)
    (double, length)
)

namespace parser
{
    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;

    template<typename Iterator>
    struct newick_grammar : qi::grammar<Iterator, ptree(), ascii::space_type>
    {
        public:
            newick_grammar() : newick_grammar::base_type(tree)
            {
                using qi::lexeme;
                using qi::double_;
                using ascii::char_;

                /* This is the only grammar that works fine:
                 * http://evolution.genetics.washington.edu/phylip/newick_doc.html */
                label = lexeme[+(char_ - ':' - ')' - ',')];
                branch_length = ':' >> double_;

                subtree = 
                       -descendant_list 
                    >> -label 
                    >> -branch_length;

                descendant_list = 
                       '(' 
                    >> subtree
                    >> *(',' >> subtree )   
                    >> ')';

                tree = subtree >> ';';

                BOOST_SPIRIT_DEBUG_NODE(label);
                BOOST_SPIRIT_DEBUG_NODE(branch_length);
                BOOST_SPIRIT_DEBUG_NODE(subtree);
                BOOST_SPIRIT_DEBUG_NODE(descendant_list);
                BOOST_SPIRIT_DEBUG_NODE(tree);
            }

        private:

            /* grammar rules */
            qi::rule<Iterator, ptree(), ascii::space_type> tree, subtree;
            qi::rule<Iterator, ptree_recursive(), ascii::space_type> descendant_list;
            qi::rule<Iterator, double(), ascii::space_type> branch_length;
            qi::rule<Iterator, std::string(), ascii::space_type> label;
    };
}

提供给解析器的ptree实例存储了newick树。用于此代码的测试字符串如下:

(((One:0.1,Two:0.2)Sub1:0.3,(Three:0.4,Four:0.5)Sub2:0.6)Sub3:0.7,Five:0.8)Root:0.9;

解析器正确识别了语法，但生成了部分树。特别是，返回的 ptree 实例包含“Root”节点及其第一个“Sub3”子节点。我尝试使用 push_at 和 at_c 方法(解释为 here )。我得到了相同的结果。

为什么语法似乎不能创建和添加所有节点，甚至能够识别语法并同时遍历树？

谢谢你的建议。

解决方案

template<typename Iterator>
    struct newick_grammar : qi::grammar<Iterator, base::ptree()>
    {
        public:
            newick_grammar() : newick_grammar::base_type(tree)
            {
                /* This is the only grammar that works fine:
                 * http://evolution.genetics.washington.edu/phylip/newick_doc.html */
                label %= qi::lexeme[+(qi::char_ - ':' - ')' - ',')];
                branch_length %= ':' >> qi::double_;

                subtree = 
                       -descendant_list 
                    >> -label 
                    >> -branch_length;

                descendant_list = 
                       '(' 
                    >> subtree
                    >> *(',' >> subtree )   
                    >> ')';

                tree %= subtree >> ';';

                BOOST_SPIRIT_DEBUG_NODE(label);
                BOOST_SPIRIT_DEBUG_NODE(branch_length);
                BOOST_SPIRIT_DEBUG_NODE(subtree);
                BOOST_SPIRIT_DEBUG_NODE(descendant_list);
                BOOST_SPIRIT_DEBUG_NODE(tree);
            }

        private:

            /* grammar rules */
            qi::rule<Iterator, base::ptree()> tree, subtree;
            qi::rule<Iterator, base::children_ptree()> descendant_list;
            qi::rule<Iterator, double()> branch_length;
            qi::rule<Iterator, std::string()> label;
    };

最佳答案

我认为您的程序中有很多 cargo 崇拜代码。例如，变体是完全无用的。所以我重写了一下，添加了注释以帮助您理解(我希望，如果不清楚，请不要犹豫，在评论中提问)。我将空间规范放在一边，因为我认为它对您的情况毫无用处。

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <vector>
#include <string>
#include <iostream>

namespace parser
{
    // Forward declaration for the vector
    struct ptree;

    // typedef to ease the writing
    typedef std::vector<ptree> children_vector;

    // The tree structure itseflf
    struct ptree
    {
        children_vector children;
        std::string name;
        double length;
    };

    // Streaming operator for printing the result
    std::ostream& operator<<(std::ostream& stream, const ptree& tree)
    {
        bool first = true;
        stream << "(" << tree.name << ": " << tree.length << " { ";
        for (auto child: tree.children)
        {
            stream << (first ? "" : "," ) << child;
            first = false;
        }

        stream << " }";
        return stream;
    }
}

// adapt the structure to fusion phoenix
BOOST_FUSION_ADAPT_STRUCT(
    parser::ptree,
    (parser::children_vector, children)
    (std::string, name)
    (double, length)
)

namespace parser
{
    // namespace aliasing to shorten the names
    namespace qi = boost::spirit::qi;    
    namespace phoenix = boost::phoenix;

    // This grammar parse string to a ptree
    struct newick_grammar : qi::grammar<std::string::const_iterator, ptree()>
    {
    public:
        newick_grammar() 
            : newick_grammar::base_type(tree) // We try to parse the tree rule
        {                
            using phoenix::at_c; // Access nth field of structure
            using phoenix::push_back; // Push into vector

            // For label use %= to assign the result of the parse to the string
            label %= qi::lexeme[+(qi::char_ - ':' - ')' - ',')]; 

            // For branch length use %= to assign the result of the parse to the
            // double
            branch_length %= ':' >> qi::double_;

            // When parsing the subtree just assign the elements that have been
            // built in the subrules
            subtree = 
                // Assign vector of children to the first element of the struct
                -descendant_list [at_c<0>(qi::_val) = qi::_1 ] 
                // Assign the label to the second element
                >> -label [ at_c<1>(qi::_val) = qi::_1 ]
                // Assign the branch length to the third element 
                >> -branch_length [ at_c<2>(qi::_val) = qi::_1 ];

            // Descendant list is a vector of ptree, we just push back the
            // created ptrees into the vector
            descendant_list = 
                '(' >> subtree [ push_back(qi::_val, qi::_1) ]
                >> *(',' >> subtree [ push_back(qi::_val, qi::_1) ])
                >> ')';

            // The tree receive the whole subtree using %=
            tree %= subtree  >> ';' ;
        }

    private:

        // Here are the various grammar rules typed by the element they do
        // generate
        qi::rule<std::string::const_iterator, ptree()> tree, subtree;
        qi::rule<std::string::const_iterator, children_vector()> descendant_list;
        qi::rule<std::string::const_iterator, double()> branch_length;
        qi::rule<std::string::const_iterator, std::string()> label;
    };
}

int main(int argc, char const *argv[])
{
    namespace qi = boost::spirit::qi;
    std::string str;

    while (getline(std::cin, str))
    {
        // Instantiate grammar and tree
        parser::newick_grammar grammar;
        parser::ptree tree;

        // Parse
        bool result = qi::phrase_parse(str.cbegin(), str.cend(), grammar, qi::space,  tree);

        // Print the result
        std::cout << "Parsing result: " << std::boolalpha << result << std::endl;
        std::cout << tree << std::endl;
    }
    return 0;
}

这是您的样本的输出:

$ ./a.exe
(((One:0.1,Two:0.2)Sub1:0.3,(Three:0.4,Four:0.5)Sub2:0.6)Sub3:0.7,Five:0.8)Root:0.9;
Parsing result: true
(Root: 0.9 { (Sub3: 0.7 { (Sub1: 0.3 { (One: 0.1 {  },(Two: 0.2 {  } },(Sub2: 0.6 { (Three: 0.4 {  },(Four: 0.5 {  } } },(Five: 0.8 {  } }

关于c++ - 使用 boost::spirit 解析 Newick 语法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20470250/

文章推荐： python - Pandas 系列的groupby不起作用

文章推荐： c# - 如何设计炫酷的半透明闪屏？

文章推荐： python - pygame.key.get_pressed() 不工作

文章推荐： c++ - 不在 C++ 程序中使用 libpq 检索 NOTIFY

boost-spirit - Boost Spirit X3 量产准备好了吗？
我正在将一个手写解析器迁移到 Boost.Spirit (2.5.4)。第一印象是积极的，但由于我使用的是 C++17，X3 似乎是一个非常有吸引力的选择。幸运的是，有很多关于 X3 的可用资源:
boost-spirit - boost::spirit::qi 前瞻以匹配字符串中的最后一次出现
是否可以使用 boost::spirit::qi 来解析以下内容？ A_B --> (A, B) A_B_C --> (A_B, C) A_B_C_D --> (A_B_
boost-spirit - 使用 Spirit.Qi 消除语法糖
我正在尝试解析一种类似 lisp 的语言，它具有一些通用功能的语法糖。例如，plus 函数可以写成 (+ 1 2) 或 1 + 2。我认为在尝试解释语言之前消除句法糖会显着促进解释过程，因为那样的话，
boost-spirit - 使用 Spirit.Qi 消除语法糖
我正在尝试解析一种类似 lisp 的语言，它具有一些通用功能的语法糖。例如，plus 函数可以写成 (+ 1 2) 或 1 + 2。我认为在尝试解释语言之前消除句法糖会显着促进解释过程，因为那样的话，
c++ - 如何使用存储在 boost spirit 闭包中的变量作为 boost spirit 循环解析器的输入？
我想使用解析后的值作为循环解析器的输入。语法定义了一个 header ，它指定了以下字符串的(可变)大小。例如，假设以下字符串是某个解析器的输入。 12\r\nTest Payload 解析器应提取
c++ - 有没有办法将 spirit::lex 字符串标记的内容匹配为 spirit::qi 语法中的文字
我正在编写 DSL 并使用 Boost Spirit 词法分析器来标记我的输入。在我的语法中，我想要一个类似于此的规则(其中 tok 是词法分析器): header_block = tok.n
boost-spirit - 从 boost Spirit 语法中获取结果(phoenix push_back 导致编译错误)
我有以下精神语法。我正在尝试在 struct myresult 中创建 AST 节点的向量使用标准 push_back(at_c(qi::_val), qi::_1)但出现编译错误(见下文)。 typ
c++ - boost::spirit 绑定(bind)函数提供参数作为 spirit:qi::_val
需要为 std::pair 对象提供类型为 boost::variant 的对象的值。您将如何使用其他资源来实现这个想法？下面还有其他方法吗？ struct aggr_pair_visitor
c++ - 如何结合 boost::spirit::lex 和 boost::spirit::qi？
我有一个词法分析器，基于该词法分析器，我现在想创建一个使用该词法分析器生成的标记的语法。我尝试改编我发现的一些示例，现在我有一些可以编译和工作的东西至少有一点，但我的一个应该失败的测试却没有。现在我想
c++ - 使用 spirit::qi 时如何忽略 spirit::Lex 的 token 属性？
当我使用此 qi 语法从 Lex 接受标记时: pair %= token(ID_MARKER) >> ':' >> atom >> ',' >> atom
c++ - boost::spirit::qi::double_ 和 boost::spirit::qi::int_
如何解析可能包含 double 或 int 的字符串，具体取决于是否设置了点。例如。 6.0是double类型，6是int类型。规则是 rule,skipper> r = qi::double_|qi
c++ - boost spirit 语法错误 - "no type named ‘size’ 中的 ‘struct boost::spirit::unused_type’“
请帮助我诊断以下错误。我有一个简单的语法: struct json_start_elem_grammar_object : qi::grammar { json_start_elem_gramma
c++ - 使用 Boost.Spirit.Lex 和 Boost.Spirit.Qi 解析 "true"和 "false"
作为使用 Boost.Spirit 的更大语法的第一阶段，我尝试解析“true”和“false”以生成相应的 bool 值，true 和 false. 我正在使用 Spirit.Lex 对输入进行标记
Boost Spirit 将表达式标记化为向量
我正在尝试解析一个也可以包含标识符的表达式并将每个元素推送到 std::vector 中，我想出了以下语法: #include #include #include #include name
boost-spirit - 如果使用惰性求值实现三元类型
我正在为 if 函数实现生产规则: qi::rule f_if; f_if = qi::ascii::string("if") >> qi::char_('(')
Boost::spirit 序列没有被解析
我编写了这段代码示例并期望它打印OPERATION( OPERATOR(aaa) ID(bbb) ) 但我只得到OPERATION ( OPERATOR(aaa) )反而。 result2 和 it1
c++ - Spirit QI解析器结束EOM
我的数据定义为: std::string data("START34*23*43**"); 我的语法: "START" >> boost::spirit::hex % '*' 题: 如何解析有两颗星的
Boost::spirit 序列没有被解析
我编写了这段代码示例并期望它打印OPERATION( OPERATOR(aaa) ID(bbb) ) 但我只得到OPERATION ( OPERATOR(aaa) )反而。 result2 和 it1
c++ - spirit 上如何解析字符串并将其用作返回值
我需要解析一个键值对，其中键本身是示例中的固定字符串lke'cmd'。不幸的是qi::lit没有综合属性，并且qi::char_没有解析固定的字符串。以下代码无法编译。执行后，我需要那个result
c++ - Spirit X3组合属性
我正在尝试编写精神规则，但我无法弄清楚这个新规则的属性是什么。以下代码按我预期的方式工作。 #include #include #include #include #include nam

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - 使用 boost::spirit 解析 Newick 语法