gpt4 book ai didi

python - 使用 XPath 提取序列子集

转载 作者:太空宇宙 更新时间:2023-11-03 16:30:16 24 4
gpt4 key购买 nike

我正在寻找一个 XPATH 来将“集合”提取为单独的序列。它必须由 python lxml(它是 libxml2 的包装器)解释。

例如,给定以下内容:

<root>
<sub1>
<sub2>
<Container>
<item>1 - My laptop has exploded again</item>
<item>2 - This is an issue which needs to be fixed.</item>
</Container>
</sub2>
<sub2>
<Container>
<item>3 - It's still not working</item>
<item>4 - do we have a working IT department or what?</item>
</Container>
</sub2>
<sub2>
<Container>
<item>5 - Never mind - I got my 8 year old niece to fix it</item>
</Container>
</sub2>
</sub1>
</root>

我希望能够“隔离”每个组或序列,例如序列 1 是:

1 - My laptop has exploded again
2 - This is an issue which needs to be fixed.

第二个序列:

3 - It's still not working
4 - do we have a working IT department or what?

第三个序列:

5 - Never mind - I got my 8 year old niece to fix it

“序列”所在的位置,用伪代码/python 翻译:

seq1 = ['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.']
seq2 = ['3 - It's still not working', '4 - do we have a working IT department or what?']
seq 3 = ['5 - Never mind - I got my 8 year old niece to fix it']

从一些初步研究看来,sequences can't be nested但我想知道 these operators 是否有一些黑魔法可以实现。

最佳答案

  1. 计算此 XPath 表达式:

    count(/*/*/*)

这会找到 <sub2> 的数量元素(等效且更具可读性,但更长,是:

count(/*/sub1/sub2))
  • 对于每个 $n 1 至 count(/*/*/*)计算以下 XPath 表达式:

    /*/*/*[$n]/*/item/text()

  • 同样,这相当于更长且更具可读性:

    /*/sub1/sub2[$n]/Container/item/text()

    在评估上述表达式之前,请替换 $n实际值为$n (例如,对字符串使用 format() 方法。

    对于提供的XML文档$n是 3,因此计算的实际 XPath 表达式是:

    /*/*/*[1]/*/item/text()

    /*/*/*[2]/*/item/text()

    /*/*/*[3]/*/item/text()

    它们各自产生以下结果:

    集合(与语言相关——数组、序列、集合、 IEnumerable<string> 等):

    "1 - My laptop has exploded again", "2 - This is an issue which needs to be fixed."

    "3 - It's still not working", "4 - do we have a working IT department or what?"

    "5 - Never mind - I got my 8 year old niece to fix it"

    关于python - 使用 XPath 提取序列子集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37662732/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com