c - 使用 libxml2 解析多文档 RELAX-NG 模式

转载作者：行者123 更新时间：2023-11-30 17:01:44

我想将 RELAX-NG 架构转换为 schemaInfo 对象，以便可以在 codemirror 中使用它来完成 xml 补全。

https://codemirror.net/demo/xmlcomplete.html

xmllint 用法

当用于验证如下文档时，libxml2 已经支持多文档relax-NG 模式:

xmllint --schema myschema.rng mydoc.xml

问题

libxml2也可以用于解析多文档模式文件吗？

以下是多文档架构的示例:

https://docs.oasis-open.org/office/v1.1/errata01/os/OpenDocument-strict-schema-v1.1-errata01-complete.rng

这是一些我不理解的 libxml2 功能，但可能会有所帮助:

http://xmlsoft.org/html/libxml-relaxng.html#xmlRelaxNGDump

假设

我认为我必须使用以下工具将多文档架构转换为单个文档架构:https://github.com/h4l/rnginline/tree/master/rnginline

直接使用libxml2会很棒，因为这样我就可以支持模式而无需预处理。

更新2016年5月3日

正如您所看到的，解析relax-NG架构仅显示顶级文件，并且它不会包含使用relax-NG主文件中的include指令包含的任何文件(注意: relax-NG 模式可以分为多个文件)。

<!-- XHTML Basic -->

<grammar ns="http://www.w3.org/1999/xhtml"
         xmlns="http://relaxng.org/ns/structure/1.0">

<include href="modules/datatypes.rng"/>
<include href="modules/attribs.rng"/>
<include href="modules/struct.rng"/>
<include href="modules/text.rng"/>
<include href="modules/hypertext.rng"/>
<include href="modules/list.rng"/>
<include href="modules/basic-form.rng"/>
<include href="modules/basic-table.rng"/>
<include href="modules/image.rng"/>
<include href="modules/param.rng"/>
<include href="modules/object.rng"/>
<include href="modules/meta.rng"/>
<include href="modules/link.rng"/>
<include href="modules/base.rng"/>

</grammar>

源代码

/**
 * section: Tree
 * synopsis: Navigates a tree to print element names
 * purpose: Parse a file to a tree, use xmlDocGetRootElement() to
 *          get the root element, then walk the document and print
 *          all the element name in document order.
 * usage: tree1 filename_or_URL
 * test: tree1 test2.xml > tree1.tmp && diff tree1.tmp $(srcdir)/tree1.res
 * author: Dodji Seketeli
 * copy: see Copyright for the status of this software.
 */
#include <stdio.h>
#include <libxml/parser.h>
#include <libxml/tree.h>

#ifdef LIBXML_TREE_ENABLED


#define ANSI_COLOR_RED     "\x1b[31m"
#define ANSI_COLOR_GREEN   "\x1b[32m"
#define ANSI_COLOR_YELLOW  "\x1b[33m"
#define ANSI_COLOR_BLUE    "\x1b[34m"
#define ANSI_COLOR_MAGENTA "\x1b[35m"
#define ANSI_COLOR_CYAN    "\x1b[36m"
#define ANSI_COLOR_RESET   "\x1b[0m"


/*
 *To compile this file using gcc you can type
 *gcc `xml2-config --cflags --libs` -o xmlexample libxml2-example.c
 */

/**
 * print_element_names:
 * @a_node: the initial xml node to consider.
 *
 * Prints the names of the all the xml elements
 * that are siblings or children of a given xml node.
 */

char* pad(int depth) {
//   if (depth <= 0)
//     return "";
  char str[2000];
//   sprintf(str, "%*s", " ", depth);
  for (int i=0; i <= depth; i++) {
    str[i] = ' ';
  }
  str[depth+1] = 0;
  return &str;
}

static void
print_element_names(xmlNode * a_node, int depth)
{
    xmlNode *cur_node = NULL;

    for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
        if (cur_node->type == XML_ELEMENT_NODE) {
//        if (strcmp(cur_node->name, "element") == 0) {
//             printf("node type: Element, name: %s\n", cur_node->name);
            printf("%s %s\n", pad(depth), cur_node->name);
            for(xmlAttrPtr attr = cur_node->properties; NULL != attr; attr = attr->next)
            {
                printf("%s", ANSI_COLOR_MAGENTA);
                printf("%s %s: ", pad(depth), attr->name);
                xmlChar* value = xmlNodeListGetString(cur_node->doc, attr->children, 1);
                printf("%s \n", value);
                printf("%s", ANSI_COLOR_RESET);
            }
//   }

        }

        print_element_names(cur_node->children, depth+1);
    }
}


/**
 * Simple example to parse a file called "file.xml",
 * walk down the DOM, and print the name of the
 * xml elements nodes.
 */
int
main(int argc, char **argv)
{
    xmlDoc *doc = NULL;
    xmlNode *root_element = NULL;

    if (argc != 2)
        return(1);

    /*
     * this initialize the library and check potential ABI mismatches
     * between the version it was compiled for and the actual shared
     * library used.
     */
    LIBXML_TEST_VERSION

    /*parse the file and get the DOM */
    doc = xmlReadFile(argv[1], NULL, 0);

    if (doc == NULL) {
        printf("error: could not parse file %s\n", argv[1]);
    }

    /*Get the root element node */
    root_element = xmlDocGetRootElement(doc);

    print_element_names(root_element, 0);

    /*free the document */
    xmlFreeDoc(doc);

    /*
     *Free the global variables that may
     *have been allocated by the parser.
     */
    xmlCleanupParser();

    return 0;
}
#else
int main(void) {
    fprintf(stderr, "Tree support not compiled in\n");
    exit(1);
}
#endif

示例用法

[nix-shell:~/Desktop/projects/nlnet/nlnet]$ ./tree1 html5-rng/xhtml-basic.rng
 grammar
  ns: http://www.w3.org/1999/xhtml 
   include
   href: modules/datatypes.rng 
   include
   href: modules/attribs.rng 
   include
   href: modules/struct.rng 
   include
   href: modules/text.rng 
   include
   href: modules/hypertext.rng 
   include
   href: modules/list.rng 
   include
   href: modules/basic-form.rng 
   include
   href: modules/basic-table.rng 
   include
   href: modules/image.rng 
   include
   href: modules/param.rng 
   include
   href: modules/object.rng 
   include
   href: modules/meta.rng 
   include
   href: modules/link.rng 
   include
   href: modules/base.rng

最佳答案

虽然问题没有必要冗长，但要求的内容很清楚。从版本 2.9.14 开始，Libxml2 似乎无法解析包含内容，只能解析 URL 或查看 filesystem ，可能会在当前目录中搜索 href 属性名称的文件名。这可能已经回答了问题，但如果必须从内存中的缓冲区加载模式，则可能还不够。一种干净的方法可能是提供回调来解析 rng:include 指令，但 Libxml2 似乎没有提供这样的 API。另一种方法实际上可以带来更高效的操作，即在不使用 include 指令的情况下递归地将外部模式合并到单个模式中。以下代码适用于我合并中等复杂度的架构(8 个文件)。只需相应地更改路径和文件名即可。

#include <memory>
#include <string>
#include <stdexcept>
#include <unordered_set>
#include <filesystem>

#include <libxml/tree.h>
#include <libxml/xmlsave.h>

using namespace std;
namespace fs = std::filesystem;

using DocPtr = std::unique_ptr<xmlDoc, decltype(&xmlFreeDoc)>;

constexpr const char* SchemaBasePath = R"(D:\Schemas)";
constexpr const char* RngSchemaFilename = "Schema.rng";
constexpr const char* MergedSchemaSavePath = R"(D:\Schemas\Schema_Merged.rng)";
constexpr const char* RngNS = "rng";
constexpr const char* RngNSHref = "http://relaxng.org/ns/structure/1.0";

struct Qualifier
{
    bool IsNamespace;
    string Name;
    string Value;
};

static DocPtr readDoc(const string_view& filepath);
static void followDoc(xmlDocPtr doc, vector<xmlNodePtr>& nodes, vector<Qualifier>& qualifiers);
static void followDoc(xmlNodePtr root, vector<xmlNodePtr>& nodes, vector<Qualifier>& qualifiers);
static void removeNode(xmlNodePtr element);
static string findHRef(const xmlNodePtr element);
static string getAttributeContent(const xmlAttrPtr attr);
static void saveDocToFile(xmlDocPtr doc, const string_view& filepath);
static void addNamespaceTo(vector<Qualifier>& qualifiers, xmlNsPtr ns);
static void addAttributeTo(vector<Qualifier>& qualifiers, xmlAttrPtr attr);

unordered_set<string> s_schemas;

int main()
{
    LIBXML_TEST_VERSION;
    auto packetRngPath = fs::u8path(SchemaBasePath) / RngSchemaFilename;
    auto packetRngDoc = readDoc(packetRngPath.u8string());

    vector<xmlNodePtr> nodes;
    vector<Qualifier> qualifiers;
    followDoc(packetRngDoc.get(), nodes, qualifiers);

    auto newDoc = DocPtr(xmlNewDoc(nullptr), &xmlFreeDoc);
    auto grammarNode = xmlNewChild((xmlNodePtr)newDoc.get(), nullptr, (const xmlChar*) "grammar", nullptr);
    if (grammarNode == nullptr)
        throw runtime_error("Can't create rng:grammar node");

    auto rngNs = xmlNewNs(grammarNode, (const xmlChar*)RngNSHref, (const xmlChar*)RngNS);
    if (rngNs == nullptr)
        throw runtime_error("Can't find or create rng namespace");
    xmlSetNs(grammarNode, rngNs);

    for (auto qualifier : qualifiers)
    {
        // Recreate the gathered namespaces and attributes
        if (qualifier.IsNamespace)
        {
            xmlNewNs(grammarNode, (const xmlChar*)qualifier.Value.data(),
                (const xmlChar*)qualifier.Name.data());
        }
        else
        {
            xmlNewProp(grammarNode, (const xmlChar*)qualifier.Name.data(),
                (const xmlChar*)qualifier.Value.data());
        }
    }

    for (auto node : nodes)
    {
        if (xmlAddChild(grammarNode, node) == nullptr)
            throw runtime_error("Can't add child node to grammar");
    }

    // This actually fixes the copied namespaces
    // to share just one instance
    if (xmlReconciliateNs(newDoc.get(), grammarNode) == -1)
        throw runtime_error("Can't reconciliate namespaces");

    saveDocToFile(newDoc.get(), MergedSchemaSavePath);

    return 0;
}

DocPtr readDoc(const string_view& filepath)
{
    return DocPtr(xmlReadFile(filepath.data(), nullptr,
        XML_PARSE_NOBLANKS), &xmlFreeDoc);
}

void followDoc(xmlDocPtr doc, vector<xmlNodePtr>& nodes, vector<Qualifier>& qualifiers)
{
    auto root = xmlDocGetRootElement(doc);

    // Fetch namespaces
    auto namespaces = xmlGetNsList(doc, root);
    unsigned i = 0;
    while (true)
    {
        auto ns = namespaces[i];
        if (ns == nullptr)
            break;

        addNamespaceTo(qualifiers, ns);
        i++;
    }
    xmlFree(namespaces);

    // Fetch attributes
    for (xmlAttrPtr attribute = root->properties; attribute; attribute = attribute->next)
        addAttributeTo(qualifiers, attribute);

    followDoc(root, nodes, qualifiers);
}

void followDoc(xmlNodePtr root, vector<xmlNodePtr>& nodes, vector<Qualifier>& qualifiers)
{
    for (auto child = xmlFirstElementChild(root); child; child = xmlNextElementSibling(child))
    {
        string href;
        if (child->ns != nullptr
            && string_view((const char*)child->ns->prefix) == "rng"
            && string_view((const char*)child->name) == "include"
            && (href = findHRef(child)).length() != 0)
        {
            if (s_schemas.find(href) == s_schemas.end())
            {
                auto schemaPath = fs::u8path(SchemaBasePath) / href;
                auto doc = readDoc(schemaPath.u8string());
                s_schemas.insert(href);
                followDoc(doc.get(), nodes, qualifiers);
            }

            continue;
        }

        auto copied = xmlCopyNode(child, 1);
        if (copied == nullptr)
            throw runtime_error("Can't copy child node");

        nodes.push_back(copied);
    }
}

void addNamespaceTo(vector<Qualifier>& qualifiers, xmlNsPtr xmlNs)
{
    for (auto ns : qualifiers)
    {
        // Ensure the namespace has not yet been added first
        if (ns.IsNamespace && ns.Name == (const char*)xmlNs->prefix)
            return;
    }
    qualifiers.push_back({ true, (const char*)xmlNs->prefix, (const char*)xmlNs->href });
}

void addAttributeTo(vector<Qualifier>& qualifiers, xmlAttrPtr xmlAttr)
{
    for (auto attr : qualifiers)
    {
        // Ensure the namespace has not yet been added first
        if (!attr.IsNamespace && attr.Name == (const char*)xmlAttr->name)
            return;
    }
    qualifiers.push_back({ false, (const char*)xmlAttr->name, getAttributeContent(xmlAttr) });
}

void removeNode(xmlNodePtr element)
{
    // Remove the existing ModifyDate. We recreate the element
    xmlUnlinkNode(element);
    xmlFreeNode(element);
}

string findHRef(const xmlNodePtr element)
{
    for (xmlAttrPtr attr = element->properties; attr; attr = attr->next)
    {
        if (string_view((const char*)attr->name) == "href")
            return getAttributeContent(attr);
    }

    return { };
}

string getAttributeContent(const xmlAttrPtr attr)
{
    xmlChar* content = xmlNodeGetContent((const xmlNode*)attr);
    if (content == nullptr)
        return { };

    unique_ptr<xmlChar, decltype(xmlFree)> contentFree(content, xmlFree);
    return string((const char*)content);
}

void saveDocToFile(xmlDocPtr doc, const string_view& filepath)
{
    auto ctx = xmlSaveToFilename(filepath.data(), "utf-8", XML_SAVE_FORMAT);
    if (ctx == nullptr || xmlSaveDoc(ctx, doc) == -1 || xmlSaveClose(ctx) == -1)
        throw runtime_error("Can't save XML document");
}

关于c - 使用 libxml2 解析多文档 RELAX-NG 模式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36850163/

文章推荐： c - 关于不正确的打包编译指示的警告

文章推荐： c - 未找到外部变量

文章推荐： c++ - zlib 的压缩函数没有做任何事情。为什么？

文章推荐： c - 从不兼容指针类型进行赋值以及取消引用指针到不完整类型

c# - 有 .NET 的 Relaxer 吗？ Relaxer 还活着吗？ RelaxNG 可行吗？
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题，以便用事实和引用来回答。关闭 5 年前。
java - "relax"圆的方程如何得到更好看的圆？
我需要使用 java 编写一个程序来完成作业。程序需要在给定坐标处输出一个给定半径的圆。到目前为止，我已经创建了一个嵌套在另一个 for 循环内的 for 循环，以扫描所有坐标并在坐标满足圆方程时打印
dijkstra - 为什么我们称 "Relaxing"为边缘？
在 Dijkstra 的最短路径算法和其他算法中，检查一条边以查看它是否提供到节点的更好路径被称为放松边。为什么叫放松？最佳答案一般来说，放松是进行减少约束的改变。当 Dijkstra 算法检查一
java - "Relaxed"Jackson 的字段名称
我正在研究 Jackson 配置，我想知道是否有任何选项可以反序列化不同类型的字段模式。例如，我有一个对象: class DeserializeIt { String fieldOne;
algorithm - 计算图的最短路径时， "relax"操作的名字应该怎么理解？
我的问题和标题一样。在计算图的最短路径时，经常会用到一个叫做relax的操作。很容易理解为什么使用这个操作，但这个名字的含义对我来说是个谜。“放松”是什么意思？这里是用伪代码编写的Dijkstra示
rust - 我可以在单线程上下文中安全地使用 Ordering::Relaxed 吗？
我需要 (Ref)UnwindSafe ty，和Cell没有提供，所以我使用 AtomicBool相反。是否保证在单线程上下文中更新到一个 Rc使用 Relaxed (商店)订购在其他 Rc 上立即
python - "relax"在 scipy 积分器中做什么
documentation对于 scipy.integrate.ode.integrate 没有描述 relax 参数的作用。它有什么作用？打开源代码显示它是一个 bool 标志，但我得到的只有这些。
c++ - 帮助使用 boost relaxed heap
我目前正在实现一些图形算法，我想要一个具有斐波那契堆或松弛堆复杂性的容器(具体来说，我想要至少 O(logN) 用于推送和弹出，O(1) 用于reduce_key)。如果可能的话(开发和测试的开销和
c++ - RELAX NG C++ 代码生成器？
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题，以便用事实和引用来回答。关闭 5 年前。
xml - 使用 Relax NG 允许附加属性
我正在编写一个放松 NG 模式来验证一些 XML 文件。对于大多数元素，有一些必需的属性，并且此 XML 模式的实例还可以添加任何额外的属性。例如，这是一个有效的文档: 在我的 Rel
xml - 我可以在 RELAX NG 紧凑模式中指定模式位置吗？
我想使用 Relax NG 紧凑模式。我是否按照 XSD 的方式在 xsi:schemaLocation 中指定 .rnc 文件？我不想将 Relax NG 转换为 XSD，因为 XSD 的局限性对我
reactjs - 如何在 React 中使用 Relax.js？
我正在尝试将 Relax.js 与 React 一起使用，但无法理解如何使用它。我只能找到https://www.npmjs.com/package/rellax#target-node .在那个链接
html - 笑脸 :relaxed: emoji not displaying in html ☺
这个问题在这里已经有了答案: Chrome is not displaying my emoji correctly (2 个回答) 9 个月前关闭。我正在尝试为网站信使显示表情符号，除 Smili
python - 如何平滑或 'relax' 由 delaunay 三角剖分产生的二维网格？
我用过this python 三角形模块，用于从一组随机二维点坐标创建三角形网格。我现在想要的是，在不添加任何点或拆分三角形的情况下，修改三角形点的位置，使三角形之间的间距更大，使它们等边或更接近等边
xml - RELAX NG Compact Syntax 正则表达式支持？
是否有可能在 RELAX NG Compact Syntax 中以定义正则表达式的方式为 text 定义一个模式，或者，可能更简单的正则语法变体只有“或”和字符类/排序操作？基本上，我想将一个属性声
javascript - 解析 "relaxed"JSON 没有 eval
解析“宽松”JSON但避免邪恶eval的最简单方法是什么？以下抛出错误: JSON.parse("{muh: 2}"); 因为正确的 JSON 应该引用键:{"muh": 2} 我的用例是一个简单的
Reorder relaxed atomic operations on the same object(对同一对象上的松散原子操作进行重新排序)
Http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync。假设x最初为0：。断言不能失败。。我不明白为什么不能对这两个加载重新排序，以便在y之前读取z，这可以得到
Reorder relaxed atomic operations on the same object(对同一对象上的松散原子操作进行重新排序)
Http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync。假设x最初为0：。断言不能失败。。我不明白为什么这两个加载不能被重新排序，这样z在y之前被读取，这可以给
Reorder relaxed atomic operations on the same object(对同一对象上的松散原子操作进行重新排序)
Http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync。假设x最初为0：。断言不能失败。。我不明白为什么不能对这两个加载重新排序，以便在y之前读取z，这可以得到
null - Pig latin relaxed 等于 == 和 null？
在只有 1 行的关系 X 中 X.A=null X.B= "blahblah" 现在我想做的是: Y = FILTER X BY A != B ; 我想说的是，由于 A 为空而 B 不为空，因此条件应

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c - 使用 libxml2 解析多文档 RELAX-NG 模式

xmllint 用法

问题

假设

更新2016年5月3日

源代码

示例用法