gpt4 book ai didi

python - ScrapyJs(scrapy+splash)无法加载脚本,但splash服务器运行良好

转载 作者:行者123 更新时间:2023-12-01 03:00:19 25 4
gpt4 key购买 nike

我正在尝试应用Scrapy(scrapyjs)来抓取带有脚本的页面,以获得完整加载的页面。我应用splash + scrapy使用以下代码渲染它。这与直接使用 localhost:8050 服务器的参数完全相同

   script = """
function main(splash)
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(0.5))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
"""

splash_args = {
'wait': 0.5,
'url': response.url,
'images': 1,
'expand': 1,
'timeout': 60.0,
'lua_source': script
}

yield SplashRequest(response.url,
self.parse_list_other_page,
cookies=response.request.cookies,
args=splash_args)

响应 html 不包含我需要的元素,但如果我直接在 localhost:8050 上使用它,启动服务器可以正常工作。

你知道问题出在哪里吗?

This is my settings.py
SPLASH_URL = 'http://127.0.0.1:8050'
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

# Enable or disable downloader middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
# scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

# Crawl responsibly by identifying yourself (and your website) on the
user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111
Safari/537.36"

SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

# Enable or disable downloader middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
# scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,


'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

最佳答案

默认端点是“render.json”;要使用 'lua_source' 参数(即运行 Lua 脚本),您必须使用 'execute' 端点:

yield SplashRequest(response.url, endpoint='execute',
self.parse_list_other_page,
cookies=response.request.cookies,
args=splash_args)

关于python - ScrapyJs(scrapy+splash)无法加载脚本,但splash服务器运行良好,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43918648/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com