gpt4 book ai didi

Python 网络抓取 : problems with classes

转载 作者:太空宇宙 更新时间:2023-11-04 02:44:43 25 4
gpt4 key购买 nike

我正试图从这个 website 中抓取房地产经纪人的名字.

我的代码:

containers = page_soup.findAll("div",{"class":"team-details"})

for container in containers:
agent_name = container.findAll("a", {"class":"team-name_link"})
name = agent_name[0].text


print("name: " + name)

但是,当我运行脚本时,我只收到前两个名称,然后是一条错误消息:

name: Michael Stavrianos
name: Kristalla Stavrianos
Traceback (most recent call last):
File "C:\Users\Toby\Desktop\Webscrape\LjHooker - mark1.py", line 16, in <module>
name = agent_name[0].text
IndexError: list index out of range

我发现前两个代理名称在“team-name_link”类下,而其余的在“team-name”类下。我不确定如何同时从两组类中抓取名称。

最佳答案

我认为你弄错了,所有名称都在所需标签内,但实际上你需要寻找 div:

from bs4 import BeautifulSoup
import requests

html = requests.get("https://woollahra.ljhooker.com.au/our-team").text
soup = BeautifulSoup(html, 'html.parser')
containers = soup.findAll("div",{"class":"team-details"})

for container in containers:
agent_name = container.find("div", {"class":"team-name"})
name = agent_name.text
print(name)

以上代码输出:

Michael Stavrianos
Licensee



Kristalla Stavrianos
Principal



Jade Marshall
Property Management Associate


Emma Phelan
Property Management Associate


Isabella Marechal - Ross
Property Management Associate


Victoria Empson
Property Investment Manager

关于Python 网络抓取 : problems with classes,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45451728/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com