XML filtering - quick find specific nodes by name + remove parent if value not match(XML过滤--按名称快速查找特定节点+如果值不匹配则删除父节点)-6ren

XML filtering - quick find specific nodes by name + remove parent if value not match(XML过滤--按名称快速查找特定节点+如果值不匹配则删除父节点)

转载作者：bug小助手更新时间：2023-10-24 23:15:35

I need a hint regarding a quick finding of specific nodes within the XML and removing the entire parent node (with children) if some of the values don't match the input parameters.

我需要一个提示来快速查找XML中的特定节点，并在某些值与输入参数不匹配时删除整个父节点(带有子节点)。

Example, having the XML as shown below:

示例，具有如下所示的XML：

<someparent attr="123" filters="+F1">
    <filter id="F1">
        <width>
            <paper size="a4" val="10" />
            <paper size="a3" val="12" />
        </width>
        <height>
            <paper size="a4" val="10" />
            <paper size="a3" val="12" />
        </height>
    </filter>
</someparent>

I should apply some rules:

我应该遵守一些规则：

like if filters has a value starting with + (+F1) then if parameters match sizes and values, like: a4/10 or a3/12 should not remove the someparent node - any other size should causing the node removal

if filters has a value starting with - (-F1) then if parameters matching sizes and values, like: a4/10 or a3/12 should remove the someparent node - any other size should leave the node intact

However, I think that may be irrelevant at this point. The most important is quickly finding the filter nodes and removing parent nodes if needed.

然而，我认为在这一点上这可能是无关紧要的。最重要的是快速找到筛选器节点，并在需要时删除父节点。

Extra notes:

额外说明：

XPath is way too slow - literally unacceptable, Iterating over every single node is relatively quick - it's currently working like that - however, I'd like to improve that. I'm pretty sure it can be improved.

it may happen that filter node(s) does not exist in the file at all

My plan is to create some prototypes, however... I'd appreciate any hints that may help me.

然而，我的计划是创造一些原型。如果有任何可能对我有帮助的提示，我将不胜感激。

更多回答

XSLT would be the first choice

XSLT将是首选

XML parsing is done in document order so the node to keep could be the last one and parsing efficiency could be similar in any case. Or as you said, the node may not exist at all but the whole doc was parsed anyway. Fast or slow is relative on XML. Moreover, finding might not be significant compared to writing the doc after removing a node. All in all, showing just a fragment of the xml without the code used to parse it is not enough to give advice.

XML解析是按文档顺序进行的，因此要保留的节点可能是最后一个，并且在任何情况下解析效率都可能是相似的。或者如您所说，节点可能根本不存在，但整个文档无论如何都被解析了。快或慢在XML上是相对的。此外，与删除节点后编写文档相比，查找可能并不重要。总而言之，只显示一段XML而不显示用于解析它的代码是不足以给出建议的。

优秀答案推荐

In general the different built-in parsers are SAX, StAX and DOM (https://rdayala.wordpress.com/dom-vs-sax-parsers/).

一般来说，不同的内置解析器是SAX、StAX和DOM(https://rdayala.wordpress.com/dom-vs-sax-parsers/).

DOM is the slow one (load everything into memory) and is used with XPath.

SAX is a pain to use.

StAX actually has 2 APIs:
- the iterator API, e.g. XMLEventReader (easier)
- the cursor API, e.g. XMLStreamReader (more efficient)

You could also try using XSLT, but the built-in one isn't necessarily the most high performing and you may need to pay for a premium one or to use all its features (streamed processing):

https://docs.oracle.com/javase/tutorial/jaxp/xslt/transformingXML.html

您也可以尝试使用XSLT，但内置的XSLT不一定是最高性能的，您可能需要花钱购买高级的XSLT，或者使用它的所有特性(流处理)：https://docs.oracle.com/javase/tutorial/jaxp/xslt/transformingXML.html

This xslt 1.0 will do your job:

这个XSLT 1.0将完成您的工作：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
  
  <!-- identity transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="someparent[@filters='+F1'][not(filter/width[paper[concat(@size,'/',@val)='a4/10'] and paper[concat(@size,'/',@val)='a3/12']])]"/>
  
  <xsl:template match="someparent[@filters='-F1'][(filter/width[paper[concat(@size,'/',@val)='a4/10'] and paper[concat(@size,'/',@val)='a3/12']])]"/>
  
  
</xsl:stylesheet>

Obviously, you need to parse the whole document, and the fastest way way to arrive at a solution is to not include the "filtered out" elements in the document building process. Both DOM4J and JDOM are good alternatives for this, since they allow custom document builders that can defer or allow the tree construction based on previously obtained conditions. SAX/StAX is of course also an alternative, but at a lower level and require more infrastructure code to get a result.

显然，您需要解析整个文档，而获得解决方案的最快方法是在文档构建过程中不包括“过滤掉的”元素。DOM4J和JDOM都是很好的替代方案，因为它们允许定制文档构建器，这些构建器可以推迟或允许基于先前获得的条件构建树。当然，SAX/StAX也是一种替代方案，但级别较低，需要更多的基础设施代码才能得到结果。

Search this site for DOM4J/JDOM and builder, I may already have given the answer ;)

在这个站点搜索DOM4J/JDOM和BUILDER，我可能已经给出了答案；)

更多回答

OK, thanks. I'm going to try XMLStreamReader first as the performance is the highest priority for me.

那好，谢谢。我将首先尝试XMLStreamReader，因为性能对我来说是最重要的。

Apparently the woodstox implementation of StAX is pretty fast, but you could also try Aalto XML.

显然StAX的Woodstox实现相当快，但您也可以尝试使用Aalto XML。

I've never thought it could be done that way. Interesting. I'll put that on the list of things to test. Thank you!

我从来没有想过会这样做。有意思的。我会把它放在测试的清单上。谢谢!

I'm going to try that. I'll edit my post above when I get some concrete results. Thank you!

我要试一试。当我得到一些具体的结果时，我会编辑我上面的帖子。谢谢!

c++ - 如何将字符串数组的值分配给 "Name"+ #，例如。名称 1、名称 2、名称 3 等。C++
我正在尝试做这样的事情:Name[i] = "Name"+ (i+1) 在 forloop 中，这样数组的值将是:Name[0] = Name1，Name[1] = Name2，Name[2] = N
javascript - 如何从Javascript或JQuery获取Grails中的 Action 名称(或 View 名称)
我读了here，在GSP中我们可以这样写: ${params.action} 从GSP中，我们可以使用${params.action}作为参数调用Javascript函数(请参阅here)。是否有其
java - 如何解析名称=值^^名称=值^^名称=值
我的问题:非常具体。我正在尝试想出解析以下文本的最简单方法: ^^domain=domain_value^^version=version_value^^account_type=account_ty
cakephp 路由修改 Controller 名称/获取 Controller 名称
我创建了一条与此类似的路线: Router::connect("/backend/:controller/:action/*"); 现在我想将符合此模式的每个 Controller 路由重命名为类似
sql - 警告 SQL71502 - 过程 <名称> 具有对对象 <名称> 的未解析引用
我在 Visual Studio 2013 项目中收到以下警告: SQL71502 - Procedure has an unresolved reference to object 最佳答案这可以
c# - 名称/值 .NET 集合或 .NET 名称/值字典？
任何人都可以指导我使用名称/值 .NET 集合或 .NET 名称/值字典以获得最佳性能吗？请问最好的方法是什么？我的应用程序是 ASP.NET、WCF/WF Web 应用程序。每个集合应该有 10 到
php - Zend Framework 2中如何获取 Controller 名称、 Action 名称
我在 Zend Framework 2 中有一个默认模块: namespace Application\Controller; use Zend\Mvc\Controller\AbstractActi
javascript - 在 javascript 中，这是一个有效的结构吗？ : document. 名称.名称.值？
这是表格: 关于javascript - 在 javascript 中，这是一个有效的结构吗？ : document. 名称.名称.值？，我们在Stack Overflow上找到一个类似的
asp.net-mvc - 给定 htmlHelper + Action 名称，如何找出 Controller 名称？
HtmlHelper.ActionLink(htmlhelper,string linktext,string action) 如何找出正确的路线？如果我有这个=> HtmlHelper.Actio
javascript - Angular Directive(指令) > 动态 Controller 名称 > 插值 Controller 名称
我需要一些有关如何将 Controller 定义传递给嵌套在 outer 指令中的 inner 指令的帮助。请参阅http://plnkr.co/edit/Om2vKdvEty9euGXJ5qan一个
algorithm - 排行榜的高效数据结构，即记录列表(名称、积分) - 高效搜索(名称)、搜索(排名)和更新(积分)
请提出一个数据结构来表示内存中的记录列表。每条记录由以下部分组成: 用户名积分排名(基于积分)- 可选字段- 可以存储在记录中或可以动态计算数据结构应该支持高效实现以下操作: Insert(re
apache-spark - Spark : Union can only be performed on tables with the compatible column types. 结构<名称，ID> != 结构
错误 : 联合只能在具有兼容列类型的表上执行。结构(层:字符串，skyward_number:字符串，skyward_points:字符串)<> 结构(skyward_number:字符串，层:字符
scala - 名称/惰性函数的重复参数
我想要一个包含可变数量函数的函数，但我希望在实际使用它们之前不要对它们求值。我可以使用 () => type 语法，但我更愿意使用 => type 语法，因为它似乎是为延迟评估而定制的。当我尝试这样
当前本地键盘映射的 Emacs 名称？
我正在编写一个 elisp 函数，它将给定键永久绑定(bind)到当前主要模式的键盘映射中的给定命令。例如， (define-key python-mode-map [C-f1] 'pytho
r - “名称”属性的长度必须与向量的长度相同
卡在R中的错误上。 Error in names(x) <- value : 'names' attribute must be the same length as the ve
python - 正则表达式从字符串中提取用户名/名称
我有字符串，其中包含名称，有时在字符串中包含用户名，后跟日期时间戳: GN1RLWFH0546-2020-04-10-18-09-52-563945.txt JOHN-DOE-2020-04-10-1
c# - 名称 `Array'在当前上下文中不存在
有人知道为什么我会收到此错误吗？这显示将我的项目升级到新版本的Unity3d之后。 Error CS0103: The name `Array' does not exist in the curre
Delphi:从数据集中读取列数+名称？
由于 Embarcadero 的 NNTP 服务器从昨天开始就停止响应，我想我可以在这里问:我使用非数据库感知网格，我需要循环遍历数据集以提取列数、它们的名称、数量行数以及每行中每个字段的值。我知道
android - 在根项目的gradle子项目中设置Android版本代码/名称
在构建Android应用程序的子项目中，我试图根据根build.gradle中的变量设置版本代码/名称。子项目build.gradle: apply plugin: 'com.android.app
javascript - 如何在不使用硬编码字符串的情况下传递javascript属性(名称)？
示例用例: 我有一个带有属性“myProperty”的对象，具有 getter 和 setter(自 EcmaScript 5 起支持“Property Getters 和 Setters”:http

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

XML filtering - quick find specific nodes by name + remove parent if value not match(XML过滤--按名称快速查找特定节点+如果值不匹配则删除父节点)