javascript - Python : How to scrape a page to get an information that will be used to scrape another one, 等等？-6ren

javascript - Python : How to scrape a page to get an information that will be used to scrape another one, 等等？

转载作者：行者123 更新时间：2023-11-30 11:45:14

我需要构建一个 python 脚本，旨在抓取网页以检索“显示更多”按钮中的数字。

此数字将用作请求 URL 的参数，该 URL 将返回包含数据 + 数字的 JSON。最后一个数字将用作请求 URL 的参数，该 URL 将返回包含数据 + 数字等的 JSON。该过程一直持续到 JSON 返回空数据 + 数字。当数据为空时，爬虫应该停止。

我用过 Scrapy，但这不起作用。 Scrapy 是异步的，根据我的情况，我需要等待第一个 JSON 结果给我下一个信息，这样我才能抓取第二个 URL，依此类推。

您建议我将什么用作 Python 库？我读过 Selenium 可以完成这项工作，但它比 Scrapy 慢得多。

最佳答案

当您在给定时间有多个 URL 要抓取时，Scrapy 的异步行为最为明显。在这种情况下，您只会在解析前一个请求后才将新请求加入队列，因此这应该不是问题。

我不知道您的 JSON 响应的确切结构，所以我们假设您有两个键，data 和 number。你可以用类似于这样的解析方法编写一个 Scrapy 蜘蛛::

def parse(self, response):
    result = json.loads(response.body)
    # do something with the data

    # request next page
    if result['data']:
        next_url = ...  # construct URL using result['number']
        yield Request(next_url)

关于javascript - Python : How to scrape a page to get an information that will be used to scrape another one, 等等？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41209030/

文章推荐： java - 解析 'multipart/alternative' 内容类型

文章推荐： swift - 上传文件到 FTP

文章推荐： java - NoClassDefFounderError 外部 jar 文件

文章推荐： java - 将 URL 重定向到另一个应用程序

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

javascript - Python : How to scrape a page to get an information that will be used to scrape another one, 等等？