xslt - 当数据结构未知时排除某些子节点-6ren

xslt - 当数据结构未知时排除某些子节点

转载作者：行者123 更新时间：2023-12-03 15:32:07

25

4

编辑 -
我找到了问题的解决方案并发布了问答 here .

我希望处理符合美国国会图书馆 EAD 标准的 XML(找到 here)。不幸的是，该标准对于 XML 的结构非常松散。

例如 <bioghist>标签可以存在于 <archdesc> 中标签，或在 <descgrp> 内标签，或嵌套在另一个 <bioghist> 中标签，或以上的组合，或者可以完全省略。我发现只选择我正在寻找的 bioghist 标签而不选择其他标签是非常困难的。

下面是我的 XSLT 可能需要处理的几种不同的 EAD XML 文档:

第一个例子

<ead>
<eadheader>
    <archdesc>
        <bioghist>one</bioghist>
        <dsc>
            <c01>
                <descgrp>
                    <bioghist>two</bioghist>
                </descgrp>
                <c02>
                    <descgrp>
                        <bioghist>
                            <bioghist>three</bioghist>
                        </bioghist>
                    </descgrp>
                </c02>
            </c01>
        </dsc>
    </archdesc>
</eadheader>
</ead>

第二个例子

<ead>
<eadheader>
    <archdesc>
        <descgrp>
            <bioghist>
                <bioghist>one</bioghist>
            </bioghist>
        </descgrp>
        <dsc>
            <c01>
                <c02>
                    <descgrp>
                        <bioghist>three</bioghist>
                    </descgrp>
                </c02>
                <bioghist>two</bioghist>
            </c01>
        </dsc>
    </archdesc>
</eadheader>
</ead>

第三个例子

<ead>
<eadheader>
    <archdesc>
        <descgrp>
            <bioghist>one</bioghist>
        </descgrp>
        <dsc>
            <c01>
                <c02>
                    <bioghist>three</bioghist>
                </c02>
            </c01>
        </dsc>
    </archdesc>
</eadheader>
</ead>

如您所见，EAD XML 文件可能具有 <bioghist>几乎在任何地方标记。我想产生的实际输出太复杂了，无法在这里发布。上述三个 EAD 示例的输出的简化示例可能如下所示:

第一个示例的输出

<records>
<primary_record>
    <biography_history>first</biography_history>
</primary_record>
<child_record>
    <biography_history>second</biography_history>
</child_record>
<granchild_record>
    <biography_history>third</biography_history>
</granchild_record>
</records>

第二个示例的输出

<records>
<primary_record>
    <biography_history>first</biography_history>
</primary_record>
<child_record>
    <biography_history>second</biography_history>
</child_record>
<granchild_record>
    <biography_history>third</biography_history>
</granchild_record>
</records>

第三个示例的输出

<records>
<primary_record>
    <biography_history>first</biography_history>
</primary_record>
<child_record>
    <biography_history></biography_history>
</child_record>
<granchild_record>
    <biography_history>third</biography_history>
</granchild_record>
</records>

如果我想提取“第一个”bioghist 值并将其放入 <primary_record> ，我不能简单地 <xsl:apply-templates select="/ead/eadheader/archdesc/bioghist" ，因为该标签可能不是 <archdesc> 的直接后代标签。它可能被 <descgrp> 包裹或 <bioghist>或其组合。我不能 select="//bioghist" ，因为这将拉动所有 <bioghist>标签。我什至不能 select="//bioghist[1]"因为实际上可能没有 <bioghist>在那里标记，然后我会将值拉到 <c01> 以下，这是“第二个”，应该稍后处理。

这已经是一个很长的帖子了，但另一个问题是可以有无限数量的 <cxx>节点，最多嵌套 12 层。我目前正在递归处理它们。我尝试将我当前正在处理的节点(例如 <c01>)保存为名为“RN”的变量，然后运行 <xsl:apply-templates select=".//bioghist [name(..)=name($RN) or name(../..)=name($RN)]"> .这适用于某些形式的 EAD，其中 <bioghist>标签没有嵌套太深，但是如果它必须处理由喜欢将标签包装在其他标签中的人创建的 EAD 文件(根据 EAD 标准完全没问题)，它就会失败。

我喜欢的是以某种方式说的

获取任何 <bioghist>标记当前节点下方的任何位置，但

如果您遇到 <c??>，请不要深入挖掘标签

我希望我已经把情况说清楚了。如果我有什么不明确的地方，请告诉我。您能提供的任何帮助将不胜感激。谢谢。

最佳答案

由于要求相当模糊，任何答案仅反射(reflect)其作者所做的猜测。

这是我的:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:my="my:my" exclude-result-prefixes="my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <my:names>
  <n>primary_record</n>
  <n>child_record</n>
  <n>grandchild_record</n>
 </my:names>

 <xsl:variable name="vNames" select="document('')/*/my:names/*"/>

 <xsl:template match="/">
  <xsl:apply-templates select=
   "//bioghist[following-sibling::node()[1]
                                [self::descgrp]
              ]"/>
 </xsl:template>

 <xsl:template match="bioghist">
  <xsl:variable name="vPos" select="position()"/>

  <xsl:element name="{$vNames[position() = $vPos]}">
   <xsl:value-of select="."/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="text()"/>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时:

<ead>
    <eadheader>
        <archdesc>
            <bioghist>first</bioghist>
            <descgrp>
                <bioghist>first</bioghist>
                <bioghist>
                    <bioghist>first</bioghist></bioghist>
            </descgrp>
            <dsc>
                <c01>
                    <bioghist>second</bioghist>
                    <descgrp>
                        <bioghist>second</bioghist>
                        <bioghist>
                            <bioghist>second</bioghist></bioghist>
                    </descgrp>
                    <c02>
                        <bioghist>third</bioghist>
                        <descgrp>
                            <bioghist>third</bioghist>
                            <bioghist>
                                <bioghist>third</bioghist></bioghist>
                        </descgrp>
                    </c02>
                </c01>
            </dsc>
        </archdesc>
    </eadheader>
</ead>

产生了想要的结果 :

<primary_record>first</primary_record>
<child_record>second</child_record>
<grandchild_record>third</grandchild_record>

关于xslt - 当数据结构未知时排除某些子节点，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11233708/

25

4

0

文章推荐： android-vision - 在 TextRecognizer 中设置 OCR 白名单

文章推荐： sqlite - SQLITE在表之间选择

文章推荐： XPath 评估结果为空目标节点

文章推荐： xpath 选择所有 text() 但不是来自特定标签/类属性

MySQL:排除
我有一个名为“members”的数据库表。分配给成员的是一个职位。职位来自部门。我有 Departments，然后是那些中的 Sub-Departments 和 Sub-Departments 中
Solr 多重过滤器标记/排除
我正在尝试为 Solr 搜索应用过滤器标记 Tagging_and_excluding_Filters . 挑战在于同时应用多个标记(对于单个页面上的多个选择选项)。例如 q=mainquery&fq
jquery:排除 child
我知道这个问题已经被问过很多次了，我已经尝试了所有建议，并阅读了有关不同选择器等的所有内容，但没有任何对我有用给出以下 HTML 片段: link
sql - LINQ 排除
是否有直接的 LINQ 语法来查找集合 B 中不存在的集合 A 的成员？在 SQL 我会写这个 SELECT A.* FROM A LEFT JOIN B ON A.ID = B.ID WHERE B
xpath - 排除，包括xPath
我试图排除并在现有xpath中包括以下xpath，但不太确定如何做到这一点 //exclude -> //*[@id="ires"]/ol/li[6]/div/a[1]/img //include
php - 排除 if 子句中的多个值
我有 30 个站点，我需要在其中 24 个站点上回显某些内容。我怎样才能排除其他人？该代码不起作用，因为我认为它的逻辑是假的:) $currentsite = get_bloginfo('wpurl'
powershell - PowerShell测试路径-排除
我需要对目标文件夹进行检查，并检查文件是否来自今天，并且超过5kb 下面的命令根据使用今天的日期存在的文件来提供bool值，但是我还要添加-gt5kb之类的排除项我尝试使用-Exlcude，但不确定
elasticsearch - Elasticsearch匹配除指定字段之外的所有查询，排除
我编入索引的Elasticsearch文档包含许多字段。我一直在使用match_all查询来获取结果。我想从match_all中排除一些字段，这可能吗？最佳答案在Elasticsearch中，您可
java - @Before 和 @After 排除
我正在为我的 DAO 编写一些测试，因为很多测试使用保存到我的数据库中的测试对象，所以我使用注释 @Before 和 @Before 创建了 setup() 和teardown() 方法@After
java - 使用java在环形平面上包含/排除
我编写了一个程序来解决以下问题: Implement a diffusion limited aggregation simulation on a toroid plane where seeds
MSBuild 排除/包含顺序
这个问题不太可能帮助任何 future 的访问者；它只与一个小的地理区域、一个特定的时间点或一个非常狭窄的情况有关，这些情况并不普遍适用于互联网的全局受众。为了帮助使这个问题更广泛地适用，visit
sql - 排除 WHERE 子句中的空白值和空值
很多时候我必须运行这个查询: select * from users where name is not null and name != '' 有没有更好的方法来做到这一点。我需要更多的性能，任何建
javascript - 排除 MacOS
如果检测到某个操作系统，是否有一种简单的方法可以排除某些代码？我设计了一个运行良好的网站(它是一个 sidescroller)，当使用滚轮(向上/向下)时，它会左右滚动。但是，如果您使用的是 Mac
php - 排除 "IN"子句中的值
我应该如何排除“IN”子句中的值？ $Graduates = "45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,6
sql - 如何禁用mysql的匹配查询的50%排除
很明显，如果一个 Mysql 表的全文索引包含一个出现在 50% 的数据行中的关键字，该关键字将被匹配查询忽略因此，如果我有一个包含 50 个条目的全文索引“content”的表其中 27 个条目在
Javascript .match - 排除？
我有下面的循环。我需要提取所有不包含字母 p 的名称 (lskey)，但我的尝试不起作用。 for(var i = 0; i "); } } 如果有人能回答，我将不胜感激。最佳答案如此接
python - 排除 for 循环中的项目
我正在尝试查找 FTP 服务器上根目录的总大小。但是，我无权访问根目录中的其中一个目录。我想用这个函数对根目录的大小求和: size = 0 for filename in ftp.nlst("."
python - 排除\S正则表达式匹配中的字符
我有以下正则表达式来匹配 html 链接: 有点效果。除了不是真的。因为它在编辑: 这将使它只抓取引号而不是之后的所有内容最佳答案我认为您的正则表达式没有按照您的意愿行事。这会非贪婪地捕
python - 循环尝试/排除
我在提出异常方面遇到困难，例如: import csv o = open('/home/foo/dummy.csv', 'r') # Empty file! reader = csv.reader(o
Python 尝试/排除
关闭。这个问题是not reproducible or was caused by typos .它目前不接受答案。这个问题是由于错别字或无法再重现的问题引起的。虽然类似的问题可能是on-topi

首页

博学

6Ren·AI

商城

xslt - 当数据结构未知时排除某些子节点