gpt4 book ai didi

scrapy - scrapy-splash如何处理无限滚动?

转载 作者:行者123 更新时间:2023-12-04 16:33:21 33 4
gpt4 key购买 nike

我想对网页中向下滚动生成的内容进行反向工程。问题出在url https://www.crowdfunder.com/user/following_page/80159?user_id=80159&limit=0&per_page=20&screwrand=933中。 screwrand似乎没有遵循任何模式,因此反转URL无效。我正在考虑使用Splash进行自动渲染。如何使用Splash像浏览器一样滚动?非常感谢!
这是两个请求的代码:

request1 = scrapy_splash.SplashRequest(
'https://www.crowdfunder.com/user/following/{}'.format(user_id),
self.parse_follow_relationship,
args={'wait':2},
meta={'user_id':user_id, 'action':'following'},
endpoint='http://192.168.99.100:8050/render.html')

yield request1

request2 = scrapy_splash.SplashRequest(
'https://www.crowdfunder.com/user/following_user/80159?user_id=80159&limit=0&per_page=20&screwrand=76',
self.parse_tmp,
meta={'user_id':user_id, 'action':'following'},
endpoint='http://192.168.99.100:8050/render.html')

yield request2

ajax request shown in browser console

最佳答案

要滚动页面,您可以编写一个自定义渲染脚本(请参阅http://splash.readthedocs.io/en/stable/scripting-tutorial.html),如下所示:

function main(splash)
local num_scrolls = 10
local scroll_delay = 1.0

local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)

for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
return splash:html()
end

要渲染此脚本,请使用“执行”终结点而不是render.html终结点:

script = """<Lua script> """
scrapy_splash.SplashRequest(url, self.parse,
endpoint='execute',
args={'wait':2, 'lua_source': script}, ...)

关于scrapy - scrapy-splash如何处理无限滚动?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40325657/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com