gpt4 book ai didi

java - 解析 XML 仅获取注释和日期值

转载 作者:行者123 更新时间:2023-12-02 01:08:23 25 4
gpt4 key购买 nike

嘿,我想看看是否可以读取 XML 文件并仅收集日期格式如 YYYY-MM-DD 的标签。

Here is an online example: https://repl.it/repls/MedicalIgnorantEfficiency

这是我要解析的 xml 示例:

<?xml version="1.0" encoding="UTF-8"?>
<ncc:Message xmlns:ncc="http://blank/1.0.6"
xmlns:cs="http://blank/1.0.0"
xmlns:jx="http://blank/1.0.0"
xmlns:jm="http://blank/1.0.0"
xmlns:n-p="http://blank/1.0.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://blank/1.0.6/person person.xsd">
<ncc:DataSection>
<ncc:PersonResponse>
<!-- Message -->
<cs:CText cs:type="No">NO WANT</cs:CText>
<jm:CaseID>
<!-- OEA -->
<jm:ID>ABC123</jm:ID>
</jm:CaseID>
<jx:PersonName>
<!-- NAM -->
<jx:GivenName>Arugula</jx:GivenName>
<jx:MiddleName>Pibb</jx:MiddleName>
<jx:SurName>Atari</jx:SurName>
</jx:PersonName>
<!-- DOB -->
<ncc:PersonBirthDateText>1948-05-11</ncc:PersonBirthDateText>
<jx:PersonDetails>
<!-- SXC -->
<jx:PersonSSN>
<jx:ID/>
</jx:PersonSSN>
</jx:PersonDetails>
<n-p:Activity>
<!--DOZ-->
<jx:ActivityDate>1996-04-04</jx:ActivityDate>
<jx:HomeAgency xsi:type="cs:Organization">
<!-- ART -->
<jx:Organization>
<jx:ID>ZR5981034</jx:ID>
</jx:Organization>
</jx:HomeAgency>
</n-p:Activity>
<jx:PersonName>
<!-- DOB Newest -->
<ncc:BirthDateText>1993-05-12</ncc:BirthDateText>
<ncc:BirthDateText>1993-05-13</ncc:BirthDateText>
<ncc:BirthDateText>1993-05-14</ncc:BirthDateText>
<jx:IDDetails xsi:type="cs:IDDetails">
<!-- SMC Checker -->
<jx:SSNID>
<jx:ID/>
</jx:SSNID>
</jx:IDDetails>
</jx:PersonName>
</ncc:PersonResponse>
</ncc:DataSection>
</ncc:Message>

我希望获取日期值以及这些日期值上方的评论。对于上面的示例 xml 来说,类似如下:

Comment: < !-- DOB --> (ncc:DataSection/ncc:PersonResponse)

Date: 1948-05-11 (ncc:DataSection/ncc:PersonResponse/ncc:PersonBirthDateText)

.

Comment: < !-- DOZ --> (ncc:DataSection/ncc:PersonResponse/n-p:Activity)

Date: 1996-04-04 (ncc:DataSection/ncc:PersonResponse/n-p:Activity/jx:ActivityDate)

.

Comment: < !-- DOB Newest --> (ncc:DataSection/ncc:PersonResponse/jx:PersonName)

Date:

  1993-05-12 (ncc:DataSection/ncc:PersonResponse/jx:PersonName/ncc:BirthDateText)
1993-05-13 (ncc:DataSection/ncc:PersonResponse/jx:PersonName/ncc:BirthDateText)
1993-05-14 (ncc:DataSection/ncc:PersonResponse/jx:PersonName/ncc:BirthDateText)

我尝试执行此操作的代码是:

public static void xpathNodes() throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
File file = new File(base_);
XPath xPath = XPathFactory.newInstance().newXPath();
//String expression = "//*[not(*)]";
String expression = "([0-9]{4})-([0-9]{2})-([0-9]{2})";
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(file);
document.getDocumentElement().normalize();
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(document, XPathConstants.NODESET);

for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(getXPath(nodeList.item(i)));
}
}

private static String getXPath(Node node) {
Node parent = node.getParentNode();

if (parent == null) {
return node.getNodeName();
}

return getXPath(parent) + "/" + node.getNodeName();
}

public static void main(String[] args) throws Exception {
xpathNodes();
}

我知道正则表达式 (([0-9]{4})-([0-9]{2})-([0-9]{2})) 的工作原理如下我在 Notepad++ 中使用过它,它在打开的 xml 文件中查找日期时工作得很好。

我当前收到错误:

Exception in thread "main" javax.xml.transform.TransformerException: A location path was expected, but the following token was encountered: [

这甚至还没有考虑评论。

任何帮助都会很棒!

最佳答案

对于没有 RegEx 的 XPath 1.0 表达式,您可能会使用:

//*[string-length()=10]
[number(substring(.,1,4))=substring(.,1,4)]
[substring(.,5,1)='-']
[number(substring(.,6,2))=substring(.,6,2)]
[substring(.,8,1)='-']
[number(substring(.,9,2))=substring(.,9,2)]
|
//*[string-length()=10]
[number(substring(.,1,4))=substring(.,1,4)]
[substring(.,5,1)='-']
[number(substring(.,6,2))=substring(.,6,2)]
[substring(.,8,1)='-']
[number(substring(.,9,2))=substring(.,9,2)]
/preceding-sibling::node()[normalize-space()][1][self::comment()]

请注意:有一些重复的表达式,因为您想要选择元素和注释节点。该表达式使用众所周知的数字测试惯用语。最后,由于无法保证纯空白文本节点的解析器设置,因此在使用 normalize-space() 函数的位置谓词之前。

测试于here

编辑:强制字符串长度。

关于java - 解析 XML 仅获取注释和日期值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59737146/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com