gpt4 book ai didi

XML filtering - quick find specific nodes by name + remove parent if value not match()

转载 作者:bug小助手 更新时间:2023-10-22 15:58:58 30 4
gpt4 key购买 nike



I need a hint regarding a quick finding of specific nodes within the XML and removing the entire parent node (with children) if some of the values don't match the input parameters.

我需要一个关于在XML中快速查找特定节点的提示,如果某些值与输入参数不匹配,则删除整个父节点(带子节点)。


Example, having the XML as shown below:

示例,具有如下所示的XML:


<someparent attr="123" filters="+F1">
<filter id="F1">
<width>
<paper size="a4" val="10" />
<paper size="a3" val="12" />
</width>
<height>
<paper size="a4" val="10" />
<paper size="a3" val="12" />
</height>
</filter>
</someparent>

I should apply some rules:

我应该应用一些规则:



  • like if filters has a value starting with + (+F1) then if parameters match sizes and values, like: a4/10 or a3/12 should not remove the someparent node - any other size should causing the node removal

  • if filters has a value starting with - (-F1) then if parameters matching sizes and values, like: a4/10 or a3/12 should remove the someparent node - any other size should leave the node intact


However, I think that may be irrelevant at this point. The most important is quickly finding the filter nodes and removing parent nodes if needed.

然而,我认为在这一点上,这可能无关紧要。最重要的是快速找到过滤器节点,并在需要时删除父节点。


Extra notes:

额外注意事项:



  • XPath is way too slow - literally unacceptable, Iterating over every single node is relatively quick - it's currently working like that - however, I'd like to improve that. I'm pretty sure it can be improved.

  • it may happen that filter node(s) does not exist in the file at all


My plan is to create some prototypes, however... I'd appreciate any hints that may help me.

我的计划是创建一些原型,然而。。。如果能给我任何帮助,我将不胜感激。


更多回答

XSLT would be the first choice

XSLT将是首选

XML parsing is done in document order so the node to keep could be the last one and parsing efficiency could be similar in any case. Or as you said, the node may not exist at all but the whole doc was parsed anyway. Fast or slow is relative on XML. Moreover, finding might not be significant compared to writing the doc after removing a node. All in all, showing just a fragment of the xml without the code used to parse it is not enough to give advice.

XML解析是按文档顺序进行的,因此要保留的节点可能是最后一个节点,并且解析效率在任何情况下都可能相似。或者,正如您所说,节点可能根本不存在,但整个文档都被解析了。快或慢在XML上是相对的。此外,与删除节点后编写文档相比,查找可能并不重要。总而言之,只显示xml的一个片段而不显示用于解析它的代码是不足以提供建议的。

优秀答案推荐

In general the different built-in parsers are SAX, StAX and DOM (https://rdayala.wordpress.com/dom-vs-sax-parsers/).

通常,不同的内置解析器是SAX、StAX和DOM(https://rdayala.wordpress.com/dom-vs-sax-parsers/)。



  • DOM is the slow one (load everything into memory) and is used with XPath.

  • SAX is a pain to use.

  • StAX actually has 2 APIs:

    • the iterator API, e.g. XMLEventReader (easier)

    • the cursor API, e.g. XMLStreamReader (more efficient)




You could also try using XSLT, but the built-in one isn't necessarily the most high performing and you may need to pay for a premium one or to use all its features (streamed processing):

https://docs.oracle.com/javase/tutorial/jaxp/xslt/transformingXML.html

您也可以尝试使用XSLT,但内置的XSLT不一定是性能最高的,您可能需要付费购买高级XSLT或使用其所有功能(流式处理):https://docs.oracle.com/javase/tutorial/jaxp/xslt/transformingXML.html



This xslt 1.0 will do your job:

此xslt 1.0将完成您的工作:


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="someparent[@filters='+F1'][not(filter/width[paper[concat(@size,'/',@val)='a4/10'] and paper[concat(@size,'/',@val)='a3/12']])]"/>

<xsl:template match="someparent[@filters='-F1'][(filter/width[paper[concat(@size,'/',@val)='a4/10'] and paper[concat(@size,'/',@val)='a3/12']])]"/>


</xsl:stylesheet>


Obviously, you need to parse the whole document, and the fastest way way to arrive at a solution is to not include the "filtered out" elements in the document building process. Both DOM4J and JDOM are good alternatives for this, since they allow custom document builders that can defer or allow the tree construction based on previously obtained conditions. SAX/StAX is of course also an alternative, but at a lower level and require more infrastructure code to get a result.

显然,您需要解析整个文档,而获得解决方案的最快方法是在文档构建过程中不包含“过滤掉的”元素。DOM4J和JDOM都是很好的替代方案,因为它们允许自定义文档生成器根据之前获得的条件推迟或允许树构建。SAX/StAX当然也是一种选择,但级别较低,需要更多的基础结构代码才能得到结果。


Search this site for DOM4J/JDOM and builder, I may already have given the answer ;)

在这个网站上搜索DOM4J/JDOM和建设者,我可能已经给出了答案;)


更多回答

OK, thanks. I'm going to try XMLStreamReader first as the performance is the highest priority for me.

好的,谢谢。我将首先尝试XMLStreamReader,因为性能是我的首要任务。

Apparently the woodstox implementation of StAX is pretty fast, but you could also try Aalto XML.

显然,StAX的woodstox实现非常快,但您也可以尝试AaltoXML。

I've never thought it could be done that way. Interesting. I'll put that on the list of things to test. Thank you!

我从没想过可以这样做。有趣的我会把它列入测试清单。非常感谢。

I'm going to try that. I'll edit my post above when I get some concrete results. Thank you!

我会试试的。当我得到一些具体的结果时,我会编辑我上面的帖子。非常感谢。

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com