gpt4 book ai didi

python - 使用 python 和 xpath 选择多个值

转载 作者:太空宇宙 更新时间:2023-11-03 15:37:26 24 4
gpt4 key购买 nike

我可以毫无问题地在 python 中使用 xpath 选择单个值,但如何加入几个单个 xpath 来获得一个值?
这是 html 源代码的示例片段 (r.content):

<div class="members">
<h2>Members</h2>
<div class="member">
<span title="Last Online:&nbsp;2017-02-20 22:37:42" data-time="2017-02-20T22:37:42Z">
<span class="profile-link">
<a href="/account/view-profile/KonterBolet">
<img class="achievement" src="36.png" alt="Completed 36" title="Completed 36">KonterA</a>
</span>
<span class="memberType">Leader</span>
</span>
</div>
<div class="member">
<span title="Last Online:&nbsp;2017-02-19 11:28:20" data-time="2017-02-19T11:28:20Z">
<span class="profile-link hasTwitch twitchOffline" data-twitch-user="mardok_tv">
<a href="/account/view-profile/mardok">
<img class="achievement" src="35.png" alt="Completed 35" title="Completed 35">mardok</a>
<a class="twitch" href="//www.twitch.tv/mardok_tv" target="_blank" title="Offline"></a>
</span>
<span class="memberType">Officer</span>
</span>
</div>
</div>

我使用 python requests 获取内容并使用 lxml 解析它

import requests
from lxml import html
ses = requests.session()
r = ses.get(SITE_URL)
webContent = html.fromstring(r.content)

第一个xpath:
acc = webContent.xpath("//span/a[contains(@href,'account/view-profile')]/text()")
和结果:
['konterA', 'mardok']

第二个xpath:
twitch = webContent.xpath('//span/@data-twith-user')
和结果:
['mardok_tv']

第三个xpath:
lastOnline = webContent.xpath('//span/@data-time')
和结果:
['2017-02-20T22:37:42Z','2017-02-19T11:28:20Z']

如何将这三者结合在一起以获得这样的结果:
[['konterA','','2017-02-20T22:37:42Z'],['mardok','mardok_tv','2017-02-19T11:28:20Z']

最佳答案

考虑在同一父项下一起解析所有项,并在顶级 xpath 上进行迭代。如果不存在 attrib/element 值,则使用 XPath 的 concat() 返回空长度字符串 ''。下面还使用 XPath 的 normalize-space() 来删除值中的换行符和回车符。

# PARSING POSTED SNIPPET AS STRING
webContent = html.fromstring(htmlstr)

# INITIALIZING LISTS
acc = []; twitch = []; lastOnline = []

# ITERATING THROUGH SECOND CHILD <SPAN>
for i in webContent.xpath("//span/span[1]"):
acc.append(i.xpath("concat(normalize-space(a[contains(@href,'account/view-profile')]),'')"))
twitch.append(i.xpath("concat(@data-twitch-user, '')"))
lastOnline.append(i.xpath("concat(../@data-time, '')"))

# ZIP EQUAL LENGTH LISTS
xpath_list = list(zip(acc, twitch, lastOnline))

print(xpath_list)
# [('KonterA', '', '2017-02-20T22:37:42Z'), ('mardok', 'mardok_tv', '2017-02-19T11:28:20Z')]

关于python - 使用 python 和 xpath 选择多个值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42428844/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com