python - 美丽汤提取物循环-6ren

python - 美丽汤提取物循环

转载作者：太空宇宙更新时间：2023-11-03 19:56:36

我想从以下网站的所有页面中提取城市数据。我有下面的代码，但循环不断运行并一遍又一遍地提取数据。看起来我错过了一些东西，你能帮忙

cities = []
with requests.Session() as session:
    session.headers = {
        'x-requested-with': 'XMLHttpRequest'
    }
    page = 1
    while True:
        url = f'https://www.kununu.com/de/volkswagen/kommentare/{page}'
        response = session.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        new_comments = [
            cities.find_next_sibling('div').text.strip()
            for cities in soup.find_all('div', text=re.compile('Stadt'))
        ]
        cities += new_comments
        print(cities)
        page += 1
#print(cities)

最佳答案

您没有退出条件。您需要在某个时刻中断循环。

例如:

cities = []
with requests.Session() as session:
    session.headers = {
        'x-requested-with': 'XMLHttpRequest'
    }
    page = 1
    while True:
        if page >= 99:
            break
        url = f'https://www.kununu.com/de/volkswagen/kommentare/{page}'
        response = session.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        new_comments = [
            cities.find_next_sibling('div').text.strip()
            for cities in soup.find_all('div', text=re.compile('Stadt'))
        ]
        cities += new_comments
        print(cities)
        page += 1

print(cities)  # this will print after 98 pages

关于python - 美丽汤提取物循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59500403/

文章推荐： c# - 列表中满足条件的所有元素的最小值

文章推荐： c# - 通用 Windows 应用程序中的 BringToFront/SetZIndex

文章推荐： javascript - Bootstrap : Remove column margin

Python语法。美丽，正确，简短的代码
for item in listOfModels: if item[0] in perms: perms[item[0]][item[1]] = True else:
javascript - 掉落时元素跳跃 react 美丽 dnd
尝试使用 react-beautiful-dnd 来创建可拖动的元素，但无法找到它为何如此“跳跃”的原因。拖动的元素和其他元素在移动元素并释放后移动和缩放(参见包含的图像 gif)。只有当将一个元素/

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 美丽汤提取物循环