gpt4 book ai didi

python - 使用 BeautifulSoup 提取数据

转载 作者:行者123 更新时间:2023-11-30 22:59:09 25 4
gpt4 key购买 nike

我需要从文件中提取“7 秒前结束”:

<div class="featured__columns">             
<div class="featured__column"><i style="color:rgb(149,213,230);" class="fa fa-clock-o"></i> <span title="Today, 11:49am">Ended 7 seconds ago</span></div>
<div class="featured__column featured__column--width-fill text-right"><span title="March 7, 2016, 10:50am">2 days ago</span> by <a style="color:rgb(149,213,230);" href="/user/Eclipsy">Eclipsy</a></div><a href="/user/Eclipsy" class="global__image-outer-wrap global__image-outer-wrap--avatar-small">
<div class="global__image-inner-wrap" style="background-image:url(https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/dc/dc5b8424bd5d17e13dcfe613689921dfc29f4574_medium.jpg);"></div>
</a>
</div>

我尝试:

#!/usr/bin/python3
from bs4 import BeautifulSoup
with open("./source.html") as source_html:
soup=BeautifulSoup(source_html.read())
soup=soup.find_all("span")
print(soup[0].string)

一切都很好,但我认为我的方法太愚蠢了。有不同的方式提取数据吗?

最佳答案

您想要的跨度位于第一个 featured__column div 中:

from bs4 import BeautifulSoup

html ="""<div class="featured__columns">
<div class="featured__column"><i style="color:rgb(149,213,230);" class="fa fa-clock-o"></i> <span title="Today, 11:49am">Ended 7 seconds ago</span></div>
<div class="featured__column featured__column--width-fill text-right"><span title="March 7, 2016, 10:50am">2 days ago</span> by <a style="color:rgb(149,213,230);" href="/user/Eclipsy">Eclipsy</a></div><a href="/user/Eclipsy" class="global__image-outer-wrap global__image-outer-wrap--avatar-small">
<div class="global__image-inner-wrap" style="background-image:url(https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/dc/dc5b8424bd5d17e13dcfe613689921dfc29f4574_medium.jpg);"></div>
</a>
</div>"""


print(BeautifulSoup(html).select("div.featured__column span")[0].text)
Ended 7 seconds ago

如果你想要第一个,或者第n个跨度,你可以在选择中使用nth-of-type:

In [53]: BeautifulSoup(html).select("div.featured__column span")
Out[53]:
[<span title="Today, 11:49am">Ended 7 seconds ago</span>,
<span title="March 7, 2016, 10:50am">2 days ago</span>]

In [54]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(1)")
Out[54]: [<span title="Today, 11:49am">Ended 7 seconds ago</span>]

In [55]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(2)")
Out[55]: [<span title="March 7, 2016, 10:50am">2 days ago</span>]
In [56]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(2)")[0].text
Out[56]: u'2 days ago'

In [57]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(1)")[0].text
Out[57]: u'Ended 7 seconds ago'

我们还可以将 i 标签与 fa fa-clock-o 类一起使用,并获取它的相邻兄弟跨度:

In [70]: BeautifulSoup(html).select("i.fa.fa-clock-o + span")
Out[70]: [<span title="Today, 11:49am">Ended 7 seconds ago</span>]

In [71]: BeautifulSoup(html).select("i.fa.fa-clock-o + span")[0].text
Out[71]: u'Ended 7 seconds ago'

最后,为了准确复制您自己的逻辑,并仅获取第一个跨度 html,无论类等如何。您可以简化为:

BeautifulSoup(html).select("span:nth-of-type(1)")[0].text
BeautifulSoup(html).find("span").text

关于python - 使用 BeautifulSoup 提取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35887915/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com