gpt4 book ai didi

python - 如何从 BeautifulSoup 中的 span 标签获取文本

转载 作者:太空狗 更新时间:2023-10-29 20:58:09 26 4
gpt4 key购买 nike

我有这样的链接

<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>

我正在尝试从那里获取 1 GB。我试过了

tt  = [a['title'] for a in soup.select(".systemRequirementsRamContent span")]
for ram in tt:
if "RAM" in ram.split():
print (soup.string)

它输出None

我尝试了 a['text'] 但它给了我 KeyError。我该如何解决这个问题,我的错误是什么?

最佳答案

您可以使用 css 选择器,使用标题文本拉出您想要的跨度:

soup = BeautifulSoup("""<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>""", "xml")

print(soup.select_one("span[title*=RAM]").text)

即找到title属性包含RAMspan,相当于在python中说,if "RAM "在 span["title"] 中。

或者使用 findre.compile

import re
print(soup.find("span", title=re.compile("RAM")).text)

获取所有数据:

from bs4 import BeautifulSoup 
r = requests.get("http://www.game-debate.com/games/index.php?g_id=21580&game=000%20Plus").content

soup = BeautifulSoup(r,"lxml")
cont = soup.select_one("div.systemRequirementsRamContent")
ram = cont.select_one("span")
print(ram["title"], ram.text)
for span in soup.select("div.systemRequirementsSmallerBox.sysReqGameSmallBox span"):
print(span["title"],span.text)

这会给你:

000 Plus Minimum RAM Requirement 1 GB
000 Plus Minimum Operating System Requirement Win Xp 32
000 Plus Minimum Direct X Requirement DX 9
000 Plus Minimum Hard Disk Drive Space Requirement 500 MB
000 Plus GD Adjusted Operating System Requirement Win Xp 32
000 Plus GD Adjusted Direct X Requirement DX 9
000 Plus GD Adjusted Hard Disk Drive Space Requirement 500 MB
000 Plus Recommended Operating System Requirement Win Xp 32
000 Plus Recommended Hard Disk Drive Space Requirement 500 MB

关于python - 如何从 BeautifulSoup 中的 span 标签获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38133759/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com