gpt4 book ai didi

python - 如何使用lxml和python查找特定标签中的文本?

转载 作者:行者123 更新时间:2023-12-01 04:54:24 26 4
gpt4 key购买 nike

假设html源码如下:

some other content here
<div class="box">
<h5>this is another one title</h5>
<p>text paragraph 1 here</p>
<p>text paragraph 2 here</p>
<p>text paragraph n here</p>
</div>
<div class="box">
<h5>specific title</h5>
<p>text paragraph 1 here</p>
<p>text paragraph 2 here</p>
<p>text paragraph 3 here</p>
<p>text paragraph 4 here</p>
<small>some specific character:here are some character</small>
</div>
<div class="box">
<h5>this is another tow title</h5>
<p>text paragraph 1 here</p>
<p>text paragraph 2 here</p>
<p>text paragraph n here</p>
</div>
some other content here

如果我想要输出是:

具体标题

text paragraph 1 here
text paragraph 2 here
text paragraph 3 here
text paragraph 4 here

我想获取特定的标题和段落文本。我想将 lxml 与 python 一起使用!请帮帮我,我该怎么办?

最佳答案

使用 xpath 表达式 .//h5[text()="specific title"]/following-sibling::p/text() 将选择 p 标签具有特定标题的 h5 标记旁边的文本:

>>> import lxml.html
>>>
>>> s = '''
... <html>
... some other content here
...
... <div class="box">
... <h5>specific title</h5>
... <p>text paragraph 1 here</p>
... <p>text paragraph 2 here</p>
... <p>text paragraph 3 here</p>
... <p>text paragraph 4 here</p>
... <small>some specific character:here are some character</small>
... </div>
... <div class="box">
... <h5>this is another tow title</h5>
...
... </div>
... some other content here
... </html>
... '''
>>>
>>> root = lxml.html.fromstring(s)
>>> root.xpath('.//h5[text()="specific title"]/following-sibling::p/text()')
['text paragraph 1 here', 'text paragraph 2 here', 'text paragraph 3 here',
'text paragraph 4 here']
>>> print('\n'.join(root.xpath(
'.//h5[text()="specific title"]/following-sibling::p/text()')))
text paragraph 1 here
text paragraph 2 here
text paragraph 3 here
text paragraph 4 here

关于python - 如何使用lxml和python查找特定标签中的文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27755792/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com