gpt4 book ai didi

python - 在 Scrapy 中禁用 SSL 证书验证

转载 作者:太空狗 更新时间:2023-10-30 01:07:11 27 4
gpt4 key购买 nike

我目前正在努力解决 Scrapy 遇到的问题。每当我使用 Scrapy 抓取证书的 CN 值与服务器域名相匹配的 HTTPS 站点时,Scrapy 都很棒!但是,另一方面,每当我尝试抓取证书的 CN 值与服务器域名不匹配的站点时,我都会得到以下信息:

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 415, in dataReceived
self._write(bytes)
File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 554, in _write
sent = self._tlsConnection.send(toSend)
File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 1270, in send
result = _lib.SSL_write(self._ssl, buf, len(buf))
File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 926, in wrapper
callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1055, in infoCallback
return wrapped(connection, where, ret)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1154, in _identityVerifyingInfoCallback
verifyHostname(connection, self._hostnameASCII)
File "/usr/local/lib/python2.7/dist-packages/service_identity/pyopenssl.py", line 30, in verify_hostname
obligatory_ids=[DNS_ID(hostname)],
File "/usr/local/lib/python2.7/dist-packages/service_identity/_common.py", line 235, in __init__
raise ValueError("Invalid DNS-ID.")
exceptions.ValueError: Invalid DNS-ID.

我尽可能多地查看了文档,据我所知,Scrapy 没有办法禁用 SSL 证书验证。甚至 Scrapy Request 对象的文档(我假设是这个功能所在的地方)也没有引用:

http://doc.scrapy.org/en/1.0/topics/request-response.html#scrapy.http.Request https://github.com/scrapy/scrapy/blob/master/scrapy/http/request/init.py

也没有解决这个问题的 Scrapy 设置:

http://doc.scrapy.org/en/1.0/topics/settings.html

没有按源使用 Scrapy 并根据需要修改源,有没有人知道如何禁用 SSL 证书验证?

谢谢!

最佳答案

来自您为 the settings 链接的文档,看来您可以修改 DOWNLOAD_HANDLERS 设置。

来自文档:

"""
A dict containing the request download handlers enabled by default in
Scrapy. You should never modify this setting in your project, modify
DOWNLOAD_HANDLERS instead.
"""

DOWNLOAD_HANDLERS_BASE = {
'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
'http': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
'https': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler',
}

然后在你的设置中,像这样:

""" 
Configure your download handlers with something custom to override
the default https handler
"""
DOWNLOAD_HANDLERS = {
'https': 'my.custom.downloader.handler.https.HttpsDownloaderIgnoreCNError',
}

因此,通过为 https 协议(protocol)定义一个自定义处理程序,您应该能够处理您收到的错误并允许 scrapy 继续其业务。

关于python - 在 Scrapy 中禁用 SSL 证书验证,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32950694/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com