gpt4 book ai didi

python - 精确 标签值时出错 - BeautifulSoup

转载 作者:太空宇宙 更新时间:2023-11-03 18:06:30 28 4
gpt4 key购买 nike

我正在使用 BeautifulSoup 进行网页抓取。我有这段代码来提取 a 标签的值,但它似乎不起作用。显示错误:

AttributeError: 'int' object has no attribute 'text'

这里是代码:

import requests
from bs4 import BeautifulSoup

url = "http://www.example.com"

page = requests.get(url).text
soup_expatistan = BeautifulSoup(page)

expatistan_table = soup_expatistan.find("div", id="country-box")

expatistan_titles = expatistan_table.find_all("ul", class_="unstyled flat")[1]
#print (expatistan_titles)
for expatistan_title in expatistan_titles:
print (expatistan_title.find("a").text) """ Error on this line """

我已验证 expatistan_title 输出包含:

<li class=""> <a href="http://www.wotif.com/AR" class="multiselect__option js-country-selector " data-id="AR">Argentina</a>
</li>
<li class=""> <a href="http://www.wotif.com/AU" class="multiselect__option js-country-selector " data-id="AU">Australia</a>
</li>
<li class=""> <a href="http://www.wotif.com/AT" class="multiselect__option js-country-selector " data-id="AT">Austria</a>
</li>

最佳答案

expatistan_titles = expatistan_table.find_all("ul", class_="unstyled flat")[1]

导致expatistan_titles成为单个元素,而不是列表。然后逐个字符地迭代它,调用 find()在每个上返回字符串中子字符串的 int 位置。当然,int 对象上没有 text 属性。

就这样吧:

expatistan_titles = expatistan_table.find_all("ul", class_="unstyled flat")[1]
for expatistan_title in expatistan_titles.find_all('li'):
print (expatistan_title.find("a").text)

此外,您可以使用 CSS Selectors 来简化代码,只需 2 行:

for link in soup.select('div#country-box ul.unstyled.flat li a'):
print(link.text)

关于python - 精确 <a> 标签值时出错 - BeautifulSoup,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26781937/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com