python - Scrapy 使用代理时出现错误——twisted.python.failure.Failure OpenSSL.SSL.Error-6ren

python - Scrapy 使用代理时出现错误——twisted.python.failure.Failure OpenSSL.SSL.Error

转载作者：太空宇宙更新时间：2023-11-03 16:30:36

我对 scrapy 还很陌生，我正在尝试使用一些代理来抓取一些 craigslist 页面，但我收到了一些错误，如下所示。我尝试了命令 scrapy shell "https://craigslist.org" ，它似乎工作正常。

根据我的理解，如果我想使用代理，我必须构建自定义下载器中间件。我在这里这样做了:

class ProxyConnect(object):
    def __init__(self):
        self.proxies = None
        with open(os.path.join(os.getcwd(), "chisel", "downloaders", "resources", "config.json")) as config:
            proxies = json.load(config)
            self.proxies = proxies["proxies"]

    def process_request(self, request, spider):
        if "proxy" in request.meta:
            return
        proxy = random.choice(self.proxies)
        ip, port, username, password = proxy["ip"], proxy["port"], proxy["username"], proxy["password"]
        request.meta["proxy"] = "http://" + ip + ":" + port
        user_pass = username + ":" + password
        if user_pass:
            basic_auth = 'Basic ' + base64.encodestring(user_pass)
            request.headers['Proxy-Authorization'] = basic_auth

这是我的项目结构:

/chisel
    __init__.py
    pipelines.py
    items.py
    settings.py
    /downloaders
        __init__.py
        /downloader_middlewares
            __init__.py
        proxy_connect.py
        /resources
          config.json
    /spiders
        __init__.py
        craiglist_spider.py
        /spider_middlewares
            __init__.py
        /resources
          craigslist.json
scrapy.cfg

设置.py:

DOWNLOADER_MIDDLEWARES = {
    'chisel.downloaders.downloader_middlewares.proxy_connect.ProxyConnect': 100,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110
}

我能够测试我的代理是否正在使用此命令，并且它有效并返回了源页面

curl -x 'http://{USERNAME}:{PASSWORD}@{IP}:{PORT}' -v "http://www.google.com/"

Scrapy版本

$ scrapy version -v
Scrapy    : 1.1.0
lxml      : 3.6.0.0
libxml2   : 2.9.2
Twisted   : 16.2.0
Python    : 2.7.10 (default, Oct 23 2015, 19:19:21) - [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)]
pyOpenSSL : 16.0.0 (OpenSSL 1.0.2h  3 May 2016)
Platform  : Darwin-15.5.0-x86_64-i386-64bit

错误:

$ scrapy crawl craigslist
2016-06-04 01:44:14 [scrapy] INFO: Scrapy 1.1.0 started (bot: chisel)
2016-06-04 01:44:14 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'chisel.spiders', 'SPIDER_MODULES': ['chisel.spiders'], 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'chisel'}
2016-06-04 01:44:14 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2016-06-04 01:44:14 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'chisel.downloaders.downloader_middlewares.proxy_connect.ProxyConnect',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-06-04 01:44:14 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-06-04 01:44:14 [scrapy] INFO: Enabled item pipelines:
[]
2016-06-04 01:44:14 [scrapy] INFO: Spider opened
2016-06-04 01:44:14 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-06-04 01:44:14 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-06-04 01:44:16 [scrapy] DEBUG: Retrying <GET https://geo.craigslist.org/robots.txt> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:17 [scrapy] DEBUG: Retrying <GET https://geo.craigslist.org/robots.txt> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:18 [scrapy] DEBUG: Gave up retrying <GET https://geo.craigslist.org/robots.txt> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:18 [scrapy] ERROR: Error downloading <GET https://geo.craigslist.org/robots.txt>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:20 [scrapy] DEBUG: Retrying <GET https://geo.craigslist.org/iso/MD> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:21 [scrapy] DEBUG: Retrying <GET https://geo.craigslist.org/iso/MD> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:24 [scrapy] DEBUG: Gave up retrying <GET https://geo.craigslist.org/iso/MD> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:24 [scrapy] ERROR: Error downloading <GET https://geo.craigslist.org/iso/MD>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2016-06-04 01:44:24 [scrapy] INFO: Closing spider (finished)
2016-06-04 01:44:24 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 6,
 'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 6,
 'downloader/request_bytes': 1668,
 'downloader/request_count': 6,
 'downloader/request_method_count/GET': 6,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 6, 4, 8, 44, 24, 329662),
 'log_count/DEBUG': 7,
 'log_count/ERROR': 2,
 'log_count/INFO': 7,
 'scheduler/dequeued': 3,
 'scheduler/dequeued/memory': 3,
 'scheduler/enqueued': 3,
 'scheduler/enqueued/memory': 3,
 'start_time': datetime.datetime(2016, 6, 4, 8, 44, 14, 963452)}
2016-06-04 01:44:24 [scrapy] INFO: Spider closed (finished)

最佳答案

我得到这个是因为使用base64.encodestring而不是base64.b64encode。当使用 proxymesh.com 的代理时，似乎通常会发生此错误引用:https://github.com/scrapy/scrapy/issues/1855

这是正在运行的中间件。

import base64

class MeshProxy(object):
    # overwrite process request
    def process_request(self, request, spider):
        # Set the location of the proxy
        request.meta['proxy'] = "http://fr.proxymesh.com:31280"

        # Use the following lines if your proxy requires authentication
        proxy_user_pass = "user:pass"
        # setup basic authentication for the proxy
        encoded_user_pass = base64.b64encode(proxy_user_pass)
        request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

关于python - Scrapy 使用代理时出现错误——twisted.python.failure.Failure OpenSSL.SSL.Error，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37628372/

文章推荐： wordpress - WooCommerce - 标准支付需要通过 Paypal AppID

文章推荐： php - 如何收到paypal付款确认？

文章推荐： python beautifulsoup解析 'a'标签和href时没有链接

openssl - openssl RSA私钥和公钥
如果我使用open ssl命令 sudo openssl genrsa -out privkey.pem 2048 要生成rsa key ，它仅生成1个文件。这是私钥。我如何获得公钥。最佳答案回答
openssl - 我应该使用哪个版本的 openssl？
三个不同版本的 openssl 正在同时更新 openssl.org : 0.98, 1.0.0, 1.0.1?它们之间有什么区别，我该如何选择要使用的版本？最佳答案 https://en.wiki
openssl - 生成私钥和公钥 OpenSSL
我有以下命令用于 OpenSSL 生成私钥和公钥: openssl genrsa –aes-128-cbc –out priv.pem –passout pass:[privateKeyPass] 2
openssl - 在裸机上交叉编译 openssl
我正在尝试使用对应的 gcc (arm-none-eabi-5_4-2016q2) 为 cortex m3 机器交叉编译 openssl。机器应该有能力做 TCP 请求，我们希望在一天结束时做 HTT
openssl - OpenSSL 字节序列是小端还是大端？
我正在尝试使用 openssl dsa 实现，但我对以下细节感到非常困惑: 命令 openssl dsa .... 的选项“-text”:输出中的十六进制数字，我是否正确地假设这些是字节，因此它们是按
openssl - OpenSSL 证书错误
我正在尝试制作一个假 CA 并用它签署一个证书以与 stunnel 一起使用(这似乎只是调用 OpenSSL 例程，因此您可能不需要了解该程序来提供帮助:)。然而，stunnel 一直拒绝我的证书，说
openssl - 有没有一种方法可以在不安装手册页的情况下安装 OpenSSL？
不幸的是，Perl 在尝试安装 OpenSSL 的手册页(例如 OpenSSL_1_0_1g)时不知何故遇到了错误。因为我不需要它们 - 我只想使用 OpenSSL 作为 C 库，我想我可以通过完全跳
openssl - 什么是 OpenSSL BIO 对以及如何使用 OpenSSL BIO 对？
OpenSSL 中的 BIO 对到底是什么？它的用途是什么？我已经检查过 OpenSSL 文档，但任何细节都很少。最佳答案 OpenSSL 中的 BIO 类似于文件句柄。您可以使用一对它们来安全地相
ssl - openssl ca vs openssl x509(openssl ca 命令在证书上注册的不一样？)
openssl ca 和 openssl x509 命令有什么区别？我正在使用它来创建和签署我的 root-ca、intermed-ca 和客户端证书，但是 openssl ca 命令不会在证书上注册
openssl - 如何将 SSL_CERT_FILE 用于 OpenSSL Windows (OpenSSL 1.0.1c)
如何(如果有的话)为 OpenSSL 定义一个单一的可信证书文件在 Windows(Win-7，OpenSSL 1.0.1c)上使用 SSL_CERT_FILE 环境变量？各种研究促使我下载了 Mo
openssl - 使用 openssl 从自签名证书生成的证书签名请求是否应该显示扩展属性？
我有一个自签名证书，其中显示了列出的基本约束，但从中生成的签名请求不显示这些属性，例如 [v3_req]。我怎样才能让它可见？我正在使用 openssl 生成证书。场景: 我使用以下方法创建自签名证
openssl - 如何使用 OpenSSL 库获得协商密码
这个问题在这里已经有了答案: Check if a connection is TLSv1 vs SSLv3 (SSL_CIPHER_description/SSL_CIPHER_get_name)
openssl:如何找到编译 openssl 的配置选项
是否有更简单的方法来确定在构建 openssl 期间指定的选项，例如当时是否定义了 OPENSSL_NO_SRTP？我只能从以下方面获得有限的信息: openssl 版本 -a 命令。但是，如果我只
openssl - 是否可以在跳过到期日期的同时使用 openssl 验证证书链
我们正在与 AWS Nitro 合作，仅提供 3 小时的证书。我们正在寻找一种可以跳过验证中的过期部分并仍然确认证书链有效的方法。最佳答案根据 openssl-verify 文档
openssl - EasyPhp:如何启用 Openssl
嗨，我如何在 Easyphp 中启用 openssl，因为我收到错误消息无法发送。Mailer 错误:缺少扩展:opensslTime:使用 phpmailer 时。谢谢最佳答案在您的 php.i
openssl - 如何阅读 openssl 警报消息？
我正在尝试以编程方式读取 OpenSSL 警报消息，但无法找到执行此操作的方法。 OpenSSL API 提供如下功能: const char *SSL_alert_type_string(int v
openssl - 如何解释 openssl 速度输出？
我跑了openssl speed在我的 Ubuntu 计算机上。一些结果: Doing md4 for 3s on 16 size blocks: 9063888 md4's in 3.00s Doi
openssl - 如何配置 openssl 的默认后端引擎？
我编译了带有cryptodev支持(即硬件加速)的OpenSSL，但不幸的是默认引擎仍然是软件。 time openssl speed -evp aes-128-cbc -engine cryptod
openssl - 不支持安全重新协商 OpenSSL 问题
我需要从 RedHat Linux 服务器连接到 Microsoft Dynamics CRM 服务器。地址是xxx.api.crm4.dynamics.com。服务器接受 TLSv1 但不接受 1.
openssl - 如何修复 Openssl 上的目录错误
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。上个月关闭。 Improve thi

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Scrapy 使用代理时出现错误——twisted.python.failure.Failure OpenSSL.SSL.Error