Scrapy + Splash = 连接被拒绝-6ren

Scrapy + Splash = 连接被拒绝

转载作者：行者123 更新时间：2023-12-03 14:36:35

30

4

我使用这个 link 安装了 Splash .按照所有步骤进行安装，但 Splash 不起作用。

我的settings.py 文件:

BOT_NAME = 'Teste'
SPIDER_MODULES = ['Test.spiders']
NEWSPIDER_MODULE = 'Test.spiders'
DOWNLOADER_MIDDLEWARES = {
     'scrapy_splash.SplashCookiesMiddleware': 723,
     'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
SPLASH_URL = 'http://127.0.0.1:8050/'

当我运行 scrapy crawl TestSpider 时:

[scrapy.core.engine] INFO: Spider opened
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.google.com.br via http://127.0.0.1:8050/render.html> (failed 1 times): Connection was refused by other side: 111: Connection refused.
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.google.com.br via http://127.0.0.1:8050/render.html> (failed 2 times): Connection was refused by other side: 111: Connection refused.
[scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://www.google.com.br via http://127.0.0.1:8050/render.html> (failed 3 times): Connection was refused by other side: 111: Connection refused.
[scrapy.core.scraper] ERROR: Error downloading <GET http://www.google.com.br via http://127.0.0.1:8050/render.html>
Traceback (most recent call last):
     File "/home/ricardo/scrapy/lib/python3.5/site-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/ricardo/scrapy/lib/python3.5/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/ricardo/scrapy/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request 
defer.returnValue((yield 
download_func(request=request,spider=spider)))
twisted.internet.error.ConnectionRefusedError: Connection was refused 
by other side: 111: Connection refused.
[scrapy.core.engine] INFO: Closing spider (finished)
[scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 3, 'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 3,
'downloader/request_bytes': 1476,
'downloader/request_count': 3,
'downloader/request_method_count/POST': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 6, 29, 21, 36, 16, 72916),
'log_count/DEBUG': 3,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'memusage/max': 47468544,
'memusage/startup': 47468544,
'retry/count': 2,
'retry/max_reached': 1,
'retry/reason_count/twisted.internet.error.ConnectionRefusedError': 2,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'splash/render.html/request_count': 1,
'start_time': datetime.datetime(2017, 6, 29, 21, 36, 15, 851593)}
[scrapy.core.engine] INFO: Spider closed (finished)

这是我的蜘蛛:

import scrapy
from scrapy_splash import SplashRequest

class TesteSpider(scrapy.Spider):
    name="Teste"

    def start_requests(self):
            yield SplashRequest("http://www.google.com.br", self.parse, meta={"splash": {"endpoint":"render.html",}})

    def parse(self, response):
            self.log('Hello World')

我尝试在终端中运行:curl http://localhost:8050/render.html?url=http://www.google.com/"

输出:

curl: (7) Failed to connect to localhost port 8050: Connection Refused

最佳答案

您需要通过命令行运行:

sudo docker run -p 8050:8050 scrapinghub/splash

和settings.py一样

SPLASH_URL = 'http://localhost:8050'

关于Scrapy + Splash = 连接被拒绝，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44835828/

30

4

0

文章推荐： iis - 在IIS 7.5上运行ASP.NET Core时日志会去哪里？

文章推荐： shell - 如何将变量从 Jenkinsfile 传递给 shell 命令

文章推荐： google-play - 如何从 Google Play 控制台禁用应用签名

文章推荐： web-services - 微服务和Web服务有什么区别？

python - Scrapy-Splash:无法使用 scrapinghub/splash:latest 作为基础镜像运行 docker 容器
正在构建一个使用一些 Azure 服务和 Scrapy-Splash 的 python Scrapy 应用程序。我尝试在本地 Windows 计算机中使用 scrapinghub/splash:lat
react-native - 使用 expo-splash-screen 的 react-native expo 错误 : No native splash screen registered for given view controller.
Unhandled promise rejection: Error: No native splash screen registered for given view controller. Ca
Scrapy + Splash = 连接被拒绝
我使用这个 link 安装了 Splash .按照所有步骤进行安装，但 Splash 不起作用。我的settings.py 文件: BOT_NAME = 'Teste' SPIDER_MODULES
javascript - Splash 无法获取整个页面
我首先使用以下命令在 docker 上运行splash: docker run -p 8050:8050 scrapinghub/splash 当我转到端口 8050 并尝试渲染时: http://w
Android Splash 运行时权限不起作用
我使用了来自 repo 的完全相同的代码: https://github.com/pcess/tutorials/tree/master/SplashPermissions repo 中的独立应用程序
CSS Splash 无法将每个尺寸的图像居中
我有一个初始图像，它会在页面加载时随我的软件一起加载。在窗口的当前大小(1024 像素)下，图像以居中方式加载，但是当窗口开始最大化时，它太靠左了。这是我的CSS: #splash { wid
iOS 应用程序加载器 (Splash)
我有一个应用程序需要在启动前或在后台闲置一段时间后获取一些远程配置文件。我使用加载器 View Controller 来完成这项工作，同时显示带有加载指示器的初始屏幕。显示加载程序的最佳方式是什么(
python - splash lua脚本做多次点击访问
我正在尝试抓取 Google Scholar search results并获取与搜索匹配的每个结果的所有 BiBTeX 格式。现在我有一个带有 Splash 的 Scrapy 爬虫。我有一个 lua
python - 使用旋转代理运行 scrapy splash
我正在尝试将 scrapy 与启动和旋转代理一起使用。这是我的 settings.py: ROBOTSTXT_OBEY = False BOT_NAME = 'mybot' SPIDER_MODULE
FFMPEG - 视频中带有 Splash 图像的绿色转换
我正在使用 FFMPEG 制作包含单个单色 JPG 图像的视频: ffmpeg -y -loop 1 -framerate 30 -t 5 -i SplashBW.jpg Splash.mp4 生成的
splash-screen - 为什么 'fbi'在系统启动时不显示启动画面？
我正在尝试使用 fbi 为 Raspbian Stretch 提供启动画面。根据一些教程，我在这里找到了我的情况: /etc/systemd/system/splashscreen.service [
xpath - Scrapy + Splash:在内部html内抓取元素
我正在使用Scrapy + Splash来爬网网页，并尝试从google广告横幅和其他广告中提取数据，但是我很难弄清楚要遵循xpath的方式。我正在使用Scrpay-Splash API渲染页面，以
Scrapy-Splash 与 Tor
我已经成功使用此链接通过Tor运行Scrapy:http://pkmishra.github.io/blog/2013/03/18/how-to-run-scrapy-with-TOR-and-mul
javascript - scrapy-splash 渲染多于第一页
我正在尝试抓取一个网站，但需要在所有页面中使用启动画面，因为它们的内容是动态创建的。现在它只呈现第一页，而不是内容页或分页页。代码如下: import scrapy from scrapy_spla
ios - Splash 完成后更改 ViewController
我想在 Splash 随着时间结束时更改 viewController；我有这个: //Implementación de los métodos: - (void) cargaImagenes{
python - Scrapy Splash 总是返回相同的页面
对于预先知道其个人资料 url 的几个 Disqus 用户中的每一个，我想抓取他们的姓名和他们的关注者的用户名。我正在使用 scrapy 和 splash 这样做。但是，当我解析响应时，它似乎总是在抓
python - Scrapy Splash 点击按钮不起作用
我想做什么在 avito.ru(俄罗斯房地产网站)上，某人的电话在您点击它之前是隐藏的。我想用Scrapy+Splash收集手机。示例网址:https://www.avito.ru/moskva/
python - Scrapy with Splash 不会等待网站加载
我正在尝试通过 Python 脚本调用 Splash 来呈现和抓取交互式网站，基本上遵循此 tutorial : import scrapy from scrapy_splash import Spl
python - Scrapy Splash - 保持记录
我设法使用 scrapy+splash 连接到网站(感谢 this thread )。我知道我已登录，因为我可以显示您登录后可用的一些元素。但是，当我尝试使用另一个 SplashRequest 访问
android - cordova-splash 无法处理未处理的错误事件
当我运行 cordova-splash 命令时出现此错误。获取未处理的错误事件 > > $ cordova-splash > > Checkin

首页

博学

6Ren·AI

商城

Scrapy + Splash = 连接被拒绝