ai didi

python - 抓取网页直到 "next"页面被禁用

转载 作者:行者123 更新时间:2023-12-01 08:12:51 24 4
gpt4 key购买 nike

url = 'https://www.tripadvisor.ie/Attraction_Review-g295424-d2038312-Reviews-Global_Village-Dubai_Emirate_of_Dubai.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
def get_links():
review_links = []
for review_link in soup.find_all('a', {'class':'title'},href=True):
review_link = review_link['href']
review_links.append(review_link)
return review_links
link = 'https://www.tripadvisor.ie'
review_urls = []
for i in get_links():
review_url = link + i
print (review_url)
review_urls.append(review_url)

这里的代码保存此网页上存在的所有超链接 - 但我想抓取页面上的所有超链接直到 319。禁用分页时无法实现

最佳答案

您可以更改网址中的一个参数来循环并获取所有评论。所以我只是添加了一个循环并请求所有网址

def get_page(index):
url = "https://www.tripadvisor.ie/Attraction_Review-g295424-d2038312-Reviews-or{}-Global_Village-Dubai_Emirate_of_Dubai.html".format(str(index))
html = requests.get(url)
page = soup(html.text, 'html.parser')
return page

nb_review = 3187
for i in range(0, nb_review, 10):
page = get_page(i)

使用您的代码片段的完整代码是:

from bs4 import BeautifulSoup as soup
import requests

def get_page(index):
url = "https://www.tripadvisor.ie/Attraction_Review-g295424-d2038312-Reviews-or{}-Global_Village-Dubai_Emirate_of_Dubai.html".format(str(index))
html = requests.get(url)
page = soup(html.text, 'html.parser')
return page

def get_links(page):
review_links = []
for review_link in page.find_all('a', {'class':'title'},href=True):
review_link = review_link['href']
review_links.append(review_link)
return review_links

link = 'https://www.tripadvisor.ie'
review_urls = []
nb_review = 3187
for i in range(0, nb_review, 10):
page = get_page(i)
for i in get_links(page):
review_url = link + i
review_urls.append(review_url)
print(len(review_urls))

输出:

3187

编辑:

您显然可以抓取第一页并获取评论编号来升级代码,使其更加可定制

关于python - 抓取网页直到 "next"页面被禁用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55130559/

24 4 0
文章推荐: java - 删除两个不同Arraylist之间的公共(public)元素
文章推荐: java - CRC 计算花费太多时间
文章推荐: jquery .submit() + 禁用输入文件
文章推荐: java - 将字符串写入文件中的特定位置
行者123
个人简介

我是一名优秀的程序员,十分优秀!

滴滴打车优惠券免费领取
滴滴打车优惠券
全站热门文章
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com