XML filtering - quick find specific nodes by name + remove parent if value not match()-6ren

XML filtering - quick find specific nodes by name + remove parent if value not match()

转载作者：bug小助手更新时间：2023-10-22 15:58:58

30

4

I need a hint regarding a quick finding of specific nodes within the XML and removing the entire parent node (with children) if some of the values don't match the input parameters.

我需要一个关于在XML中快速查找特定节点的提示，如果某些值与输入参数不匹配，则删除整个父节点（带子节点）。

Example, having the XML as shown below:

示例，具有如下所示的XML：

<someparent attr="123" filters="+F1">
    <filter id="F1">
        <width>
            <paper size="a4" val="10" />
            <paper size="a3" val="12" />
        </width>
        <height>
            <paper size="a4" val="10" />
            <paper size="a3" val="12" />
        </height>
    </filter>
</someparent>

I should apply some rules:

我应该应用一些规则：

like if filters has a value starting with + (+F1) then if parameters match sizes and values, like: a4/10 or a3/12 should not remove the someparent node - any other size should causing the node removal

if filters has a value starting with - (-F1) then if parameters matching sizes and values, like: a4/10 or a3/12 should remove the someparent node - any other size should leave the node intact

However, I think that may be irrelevant at this point. The most important is quickly finding the filter nodes and removing parent nodes if needed.

然而，我认为在这一点上，这可能无关紧要。最重要的是快速找到过滤器节点，并在需要时删除父节点。

Extra notes:

额外注意事项：

XPath is way too slow - literally unacceptable, Iterating over every single node is relatively quick - it's currently working like that - however, I'd like to improve that. I'm pretty sure it can be improved.

it may happen that filter node(s) does not exist in the file at all

My plan is to create some prototypes, however... I'd appreciate any hints that may help me.

我的计划是创建一些原型，然而。。。如果能给我任何帮助，我将不胜感激。

更多回答

XSLT would be the first choice

XSLT将是首选

XML parsing is done in document order so the node to keep could be the last one and parsing efficiency could be similar in any case. Or as you said, the node may not exist at all but the whole doc was parsed anyway. Fast or slow is relative on XML. Moreover, finding might not be significant compared to writing the doc after removing a node. All in all, showing just a fragment of the xml without the code used to parse it is not enough to give advice.

XML解析是按文档顺序进行的，因此要保留的节点可能是最后一个节点，并且解析效率在任何情况下都可能相似。或者，正如您所说，节点可能根本不存在，但整个文档都被解析了。快或慢在XML上是相对的。此外，与删除节点后编写文档相比，查找可能并不重要。总而言之，只显示xml的一个片段而不显示用于解析它的代码是不足以提供建议的。

优秀答案推荐

In general the different built-in parsers are SAX, StAX and DOM (https://rdayala.wordpress.com/dom-vs-sax-parsers/).

通常，不同的内置解析器是SAX、StAX和DOM(https://rdayala.wordpress.com/dom-vs-sax-parsers/)。

DOM is the slow one (load everything into memory) and is used with XPath.

SAX is a pain to use.

StAX actually has 2 APIs:
- the iterator API, e.g. XMLEventReader (easier)
- the cursor API, e.g. XMLStreamReader (more efficient)

You could also try using XSLT, but the built-in one isn't necessarily the most high performing and you may need to pay for a premium one or to use all its features (streamed processing):

https://docs.oracle.com/javase/tutorial/jaxp/xslt/transformingXML.html

您也可以尝试使用XSLT，但内置的XSLT不一定是性能最高的，您可能需要付费购买高级XSLT或使用其所有功能（流式处理）：https://docs.oracle.com/javase/tutorial/jaxp/xslt/transformingXML.html

This xslt 1.0 will do your job:

此xslt 1.0将完成您的工作：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
  
  <!-- identity transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="someparent[@filters='+F1'][not(filter/width[paper[concat(@size,'/',@val)='a4/10'] and paper[concat(@size,'/',@val)='a3/12']])]"/>
  
  <xsl:template match="someparent[@filters='-F1'][(filter/width[paper[concat(@size,'/',@val)='a4/10'] and paper[concat(@size,'/',@val)='a3/12']])]"/>
  
  
</xsl:stylesheet>

Obviously, you need to parse the whole document, and the fastest way way to arrive at a solution is to not include the "filtered out" elements in the document building process. Both DOM4J and JDOM are good alternatives for this, since they allow custom document builders that can defer or allow the tree construction based on previously obtained conditions. SAX/StAX is of course also an alternative, but at a lower level and require more infrastructure code to get a result.

显然，您需要解析整个文档，而获得解决方案的最快方法是在文档构建过程中不包含“过滤掉的”元素。DOM4J和JDOM都是很好的替代方案，因为它们允许自定义文档生成器根据之前获得的条件推迟或允许树构建。SAX/StAX当然也是一种选择，但级别较低，需要更多的基础结构代码才能得到结果。

Search this site for DOM4J/JDOM and builder, I may already have given the answer ;)

在这个网站上搜索DOM4J/JDOM和建设者，我可能已经给出了答案；）

更多回答

OK, thanks. I'm going to try XMLStreamReader first as the performance is the highest priority for me.

好的，谢谢。我将首先尝试XMLStreamReader，因为性能是我的首要任务。

Apparently the woodstox implementation of StAX is pretty fast, but you could also try Aalto XML.

显然，StAX的woodstox实现非常快，但您也可以尝试AaltoXML。

I've never thought it could be done that way. Interesting. I'll put that on the list of things to test. Thank you!

我从没想过可以这样做。有趣的我会把它列入测试清单。非常感谢。

I'm going to try that. I'll edit my post above when I get some concrete results. Thank you!

我会试试的。当我得到一些具体的结果时，我会编辑我上面的帖子。非常感谢。

30

4

0

javascript - 将 json 编码的字符串转换为 [value, value],[value, value]
我正在尝试使用 flot 绘制 SQL 数据库中的数据图表，这是使用 php 收集的，然后使用 json 编码的。目前看起来像: [{"month":"February","data":482},
php - php数组的结果是[value][value]，我怎样才能得到像[value,value]这样的结果
我有一个来自 php 行的 json 结果，类似于 ["value"]["value"] 我尝试使用内爆函数，但得到的结果是“value”“value” |id_kategori|created_at
javascript - 为什么 select.setAttribute ('value' ,value) 产生与 select.value=value 不同的结果？
脚本 1 将记录 two 但浏览器仍会将 select 元素呈现为 One。该表单还将提交值 one。脚本 2 将记录、呈现和提交两个。我希望它们是同义词并做同样的事情。请解释它们为何不同，以及我
Python如何做列表字典的字典的.values().values()
我的python字典结构是这样的: ips[host][ip] 每行 ips[host][ip] 看起来像这样: [host, ip, network, mask, broadcast, mac, g
c# - 这是什么意思/做什么？ "value < 0 ? -value : value;"
在 C# 中我正在关注的一本书对设置和获取属性提出了这样的建议: double pri_test; public double Test { get { return pri_test; }
c++ - if (mask & VALUE) 还是 if ((mask & VALUE) == VALUE)？
您可能熟悉 enum 位掩码方案，例如: enum Flags { FLAG1 = 0x1, FLAG2 = 0x2, FLAG3 = 0x4, FLAG4 = 0x8
java - (String)value 和 value.toString() ， new Long(value) 和 (Long)value 之间的区别
在一些地方我看到了(String)value。在一些地方value.toString() 这两者有什么区别，在什么情况下我需要使用哪一个。 new Long(value) 和 (Long)value
javascript - 当 "!value ? null : value[0]"不等同于 "value ? value[0] : null"时，Javascript 中是否存在任何时间？
有没有什么时候 var result = !value ? null : value[0]; 不会等同于 var result = value ? value[0] : null; 最佳答案在此处将
javascript - 如何修复 "My first scan value is not same as my second scan value and the value scan in HTML is not same as value scan in notepad?"
我正在使用扫描仪检测设备。目前，我的条形码的值为 2345345 A1。因此，当我扫描到记事本或文本编辑器时，输出将类似于 2345345 A1，这是正确的条形码值。问题是: 当我第一次将条形码扫描
c# - 如何转换 Json key :value into value:value in C#?
我正在读取 C# 中的资源文件并将其转换为 JSON 字符串格式。现在我想将该 JSON 字符串的值转换为键。例子， [ { "key": "CreateAccount", "text":
Python( Pandas ): replace value if previous value is same as next value
我有以下问题: 我有一个数据框，最多可能有 600 万行左右。此数据框中的一列包含某些 ID。 ID NaN NaN D1 D1 D1 NaN D1 D1 NaN NaN NaN NaN D2 NaN
java - (Float value + Integer value + long value) 如何给出意想不到的结果？
import java.util.*; import java.lang.*; class Main { public static void main (String[] args) thr
android - values、values-v11 和 values-v14 文件夹的样式和主题
我目前正在开发我的应用程序，使其设计基于 Holo 主题。在全局范围内我想做的是工作，但我对文件夹 values、values-v11 和 values-v14. 所以我知道: values 的目标是
java ； HttpURL连接；查询项重复为 `paramName=value, value` 。预计为 `paramName=value`
我遇到了一个非常奇怪的问题。我的公司为我们的各种 Assets 使用集中式用户注册网络服务。我们一般通过HttpURLConnection使用请求方法GET向Web服务发送请求，通过qs设置参数。这
mySQL UPDATE value based on SELECT value of value +1 递增列值
查询: UPDATE nominees SET votes = ( SELECT votes FROM nominees WHERE ID =1 ) +1 错误: You can't specify
javascript - mathjs 评估错误 : (intermediate value)(intermediate value)(intermediate value) is not a function
如果我运行一段代码: obj = {}; obj['number'] = 1; obj['expressionS'] = 'Sin(0.5 * c1)'; obj['c
android - 错误 : String types not allowed (at 'fail' with value) @values/values. xml
我正在为我的应用创建一个带有 Twitter 帐户的登录页面。当我构建我的项目时会发生上述错误。 values/strings.xml @dimen/abc_text_size_medium
mysql - View 中的 SUM(table2.value * table2.value) (+ table1.value)
我在搜索引擎中使用以下 View : CREATE VIEW msr_joined_view AS SELECT table1.id AS msr_id, table1.msr_number, tab
xhtml - 验证错误 "Value Error : background-position Too many values or values are not"如何解决？
为什么验证会返回此错误。如何解决？ ul#navigation li#navigation-3 a.current Value Error : background-position Too
Python 数据帧 : find previous row's value before a specific value with same value in other columns
我有一个数据名如下 import pandas as pd d = { 'Name' : ['James', 'John', 'Peter', 'Thomas', 'Jacob', 'Andr

首页

博学

6Ren·AI

商城

XML filtering - quick find specific nodes by name + remove parent if value not match()