gpt4 book ai didi

python - 使用 BeautifulSoup 从标签中提取 'href'

转载 作者:太空宇宙 更新时间:2023-11-04 01:46:33 27 4
gpt4 key购买 nike

我在 html 中有以下标签,我只想提取 href 内容,即 Quatermass_2_Vintage_Movie_Poster-61-10782 和 Hard Day's Night

<span class="small">
Ref.No:10782<br/>
<a href="Quatermass_2_Vintage_Movie_Poster-61-10782" title="Click for more details and a larger picture of Quatermass 2">
Click for more details and a larger picture of <b>Quatermass 2</b>
</a>
</span>, <span class="small">
Ref.No:10781<br/>
<a href="Hard_Day__039_s_Night_Vintage_Movie_Poster-61-10781" title="Click for more details and a larger picture of Hard Day's Night">
Click for more details and a larger picture of <b>Hard Day's Night</b>
</a>
</span>

下面的 python 代码使我能够只找到整个标签

html = ['table2.html']

with open("table2.html", "r") as f:
contents = f.read()


soup = BeautifulSoup(contents, "lxml")

for name in soup.find_all("span", {"class": "small"}):
print(name)

但是无法仅选择 href。我试过了

for name in soup.find_all("span", {"class": "small"}.get(href)):
print(name)

我也试过将 href 引用放在打印语句中

for name in soup.find_all("span", {"class": "small"}:
print(name.get('href'))

有好心人帮忙吗?

最佳答案

获取span标签后,您需要找到a标签,然后获取href属性。

像这样的东西会起作用:

for name in soup.find_all("span", {"class": "small"}):
print(name.find("a").get("href"))

关于python - 使用 BeautifulSoup 从标签中提取 'href',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58981539/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com