gpt4 book ai didi

python - 美丽汤提取物循环

转载 作者:太空宇宙 更新时间:2023-11-03 19:56:36 24 4
gpt4 key购买 nike

我想从以下网站的所有页面中提取城市数据。我有下面的代码,但循环不断运行并一遍又一遍地提取数据。看起来我错过了一些东西,你能帮忙

cities = []
with requests.Session() as session:
session.headers = {
'x-requested-with': 'XMLHttpRequest'
}
page = 1
while True:
url = f'https://www.kununu.com/de/volkswagen/kommentare/{page}'
response = session.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
new_comments = [
cities.find_next_sibling('div').text.strip()
for cities in soup.find_all('div', text=re.compile('Stadt'))
]
cities += new_comments
print(cities)
page += 1
#print(cities)

最佳答案

您没有退出条件。您需要在某个时刻中断循环。

例如:

cities = []
with requests.Session() as session:
session.headers = {
'x-requested-with': 'XMLHttpRequest'
}
page = 1
while True:
if page >= 99:
break
url = f'https://www.kununu.com/de/volkswagen/kommentare/{page}'
response = session.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
new_comments = [
cities.find_next_sibling('div').text.strip()
for cities in soup.find_all('div', text=re.compile('Stadt'))
]
cities += new_comments
print(cities)
page += 1

print(cities) # this will print after 98 pages

关于python - 美丽汤提取物循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59500403/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com