- c - 在位数组中找到第一个零
- linux - Unix 显示有关匹配两种模式之一的文件的信息
- 正则表达式替换多个文件
- linux - 隐藏来自 xtrace 的命令
我被一个看似与 asyncio
+ aiohttp
相关的问题难住了,当发送大量并发 GET 请求时,超过 85% 的请求会引发 aiohttp.client_exceptions.ClientConnectorError
异常最终源于
socket.gaierror(8, 'nodename nor servname provided, or not known')
发送单个 GET 请求或在主机/端口上进行基础 DNS 解析时不会引发此异常。
虽然在我的真实代码中,我正在做大量的自定义,例如使用自定义 TCPConnector
例如,我可以仅使用“默认”aiohttp
类实例和参数重现该问题,如下所示。
我跟踪了回溯,异常的根源与 DNS 解析有关。它来自 _create_direct_connection
aiohttp.TCPConnector
的方法,调用 ._resolve_host()
.
我也试过:
aiodns
sudo killall -HUP mDNSResponder
family=socket.AF_INET
作为 TCPConnector
的参数(尽管我很确定这被 aiodns
使用)。这使用 2
而不是该参数的默认 int 0
ssl=True
和 ssl=False
一切都无济于事。
要重现的完整代码如下。输入 URL 位于 https://gist.github.com/bsolomon1124/fc625b624dd26ad9b5c39ccb9e230f5a。 .
import asyncio
import itertools
import aiohttp
import aiohttp.client_exceptions
from yarl import URL
ua = itertools.cycle(
(
"Mozilla/5.0 (X11; Linux i686; rv:64.0) Gecko/20100101 Firefox/64.0",
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:62.0) Gecko/20100101 Firefox/62.0",
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; ko; rv:1.9.1b2) Gecko/20081201 Firefox/60.0",
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
)
)
async def get(url, session) -> str:
async with await session.request(
"GET",
url=url,
raise_for_status=True,
headers={'User-Agent': next(ua)},
ssl=False
) as resp:
text = await resp.text(encoding="utf-8", errors="replace")
print("Got text for URL", url)
return text
async def bulk_get(urls) -> list:
async with aiohttp.ClientSession() as session:
htmls = await asyncio.gather(
*(
get(url=url, session=session)
for url in urls
),
return_exceptions=True
)
return htmls
# See https://gist.github.com/bsolomon1124/fc625b624dd26ad9b5c39ccb9e230f5a
with open("/path/to/urls.txt") as f:
urls = tuple(URL(i.strip()) for i in f)
res = asyncio.run(bulk_get(urls)) # urls: Tuple[yarl.URL]
c = 0
for i in res:
if isinstance(i, aiohttp.client_exceptions.ClientConnectorError):
print(i)
c += 1
print(c) # 21205 !!!!! (85% failure rate)
print(len(urls)) # 24934
从 res
打印每个异常字符串看起来像:
Cannot connect to host sigmainvestments.com:80 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host giaoducthoidai.vn:443 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host chauxuannguyen.org:80 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host www.baohomnay.com:443 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host www.soundofhope.org:80 ssl:False [nodename nor servname provided, or not known]
# And so on...
令人沮丧的是,我可以毫无问题地ping
这些主机,甚至可以调用底层的._resolve_host()
:
重击/外壳:
[~/] $ ping -c 5 www.hongkongfp.com
PING www.hongkongfp.com (104.20.232.8): 56 data bytes
64 bytes from 104.20.232.8: icmp_seq=0 ttl=56 time=11.667 ms
64 bytes from 104.20.232.8: icmp_seq=1 ttl=56 time=12.169 ms
64 bytes from 104.20.232.8: icmp_seq=2 ttl=56 time=12.135 ms
64 bytes from 104.20.232.8: icmp_seq=3 ttl=56 time=12.235 ms
64 bytes from 104.20.232.8: icmp_seq=4 ttl=56 time=14.252 ms
--- www.hongkongfp.com ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.667/12.492/14.252/0.903 ms
python :
In [1]: import asyncio
...: from aiohttp.connector import TCPConnector
...: from clipslabapp.ratemgr import default_aiohttp_tcpconnector
...:
...:
...: async def main():
...: conn = default_aiohttp_tcpconnector()
...: i = await asyncio.create_task(conn._resolve_host(host='www.hongkongfp.com', port=443))
...: return i
...:
...: i = asyncio.run(main())
In [2]: i
Out[2]:
[{'hostname': 'www.hongkongfp.com',
'host': '104.20.232.8',
'port': 443,
'family': <AddressFamily.AF_INET: 2>,
'proto': 6,
'flags': <AddressInfo.AI_NUMERICHOST: 4>},
{'hostname': 'www.hongkongfp.com',
'host': '104.20.233.8',
'port': 443,
'family': <AddressFamily.AF_INET: 2>,
'proto': 6,
'flags': <AddressInfo.AI_NUMERICHOST: 4>}]
我的设置:
关于异常本身的信息:
异常是 aiohttp.client_exceptions.ClientConnectorError
,它将 socket.gaierror
包装为底层 OSError
。
因为我在 asyncio.gather()
中有 return_exceptions=True
,所以我可以自己获取异常实例以供检查。这是一个例子:
In [18]: i
Out[18]:
aiohttp.client_exceptions.ClientConnectorError(8,
'nodename nor servname provided, or not known')
In [19]: i.host, i.port
Out[19]: ('www.hongkongfp.com', 443)
In [20]: i._conn_key
Out[20]: ConnectionKey(host='www.hongkongfp.com', port=443, is_ssl=True, ssl=False, proxy=None, proxy_auth=None, proxy_headers_hash=None)
In [21]: i._os_error
Out[21]: socket.gaierror(8, 'nodename nor servname provided, or not known')
In [22]: raise i.with_traceback(i.__traceback__)
---------------------------------------------------------------------------
gaierror Traceback (most recent call last)
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_direct_connection(self, req, traces, timeout, client_error)
954 port,
--> 955 traces=traces), loop=self._loop)
956 except OSError as exc:
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _resolve_host(self, host, port, traces)
824 addrs = await \
--> 825 self._resolver.resolve(host, port, family=self._family)
826 if traces:
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/resolver.py in resolve(self, host, port, family)
29 infos = await self._loop.getaddrinfo(
---> 30 host, port, type=socket.SOCK_STREAM, family=family)
31
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py in getaddrinfo(self, host, port, family, type, proto, flags)
772 return await self.run_in_executor(
--> 773 None, getaddr_func, host, port, family, type, proto, flags)
774
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py in run(self)
56 try:
---> 57 result = self.fn(*self.args, **self.kwargs)
58 except BaseException as exc:
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
747 addrlist = []
--> 748 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
749 af, socktype, proto, canonname, sa = res
gaierror: [Errno 8] nodename nor servname provided, or not known
The above exception was the direct cause of the following exception:
ClientConnectorError Traceback (most recent call last)
<ipython-input-22-72402d8c3b31> in <module>
----> 1 raise i.with_traceback(i.__traceback__)
<ipython-input-1-2bc0f5172de7> in get(url, session)
19 raise_for_status=True,
20 headers={'User-Agent': next(ua)},
---> 21 ssl=False
22 ) as resp:
23 return await resp.text(encoding="utf-8", errors="replace")
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/client.py in _request(self, method, str_or_url, params, data, json, cookies, headers, skip_auto_headers, auth, allow_redirects, max_redirects, compress, chunked, expect100, raise_for_status, read_until_eof, proxy, proxy_auth, timeout, verify_ssl, fingerprint, ssl_context, ssl, proxy_headers, trace_request_ctx)
474 req,
475 traces=traces,
--> 476 timeout=real_timeout
477 )
478 except asyncio.TimeoutError as exc:
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in connect(self, req, traces, timeout)
520
521 try:
--> 522 proto = await self._create_connection(req, traces, timeout)
523 if self._closed:
524 proto.close()
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_connection(self, req, traces, timeout)
852 else:
853 _, proto = await self._create_direct_connection(
--> 854 req, traces, timeout)
855
856 return proto
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_direct_connection(self, req, traces, timeout, client_error)
957 # in case of proxy it is not ClientProxyConnectionError
958 # it is problem of resolving proxy ip itself
--> 959 raise ClientConnectorError(req.connection_key, exc) from exc
960
961 last_exc = None # type: Optional[Exception]
ClientConnectorError: Cannot connect to host www.hongkongfp.com:443 ssl:False [nodename nor servname provided, or not known
为什么我认为这不是操作系统级别本身的 DNS 解析问题?
我可以成功 ping 我的 ISP 的 DNS 服务器的 IP 地址,该地址在 (Mac OSX) 系统偏好设置 > 网络 > DNS 中给出:
[~/] $ ping -c 2 75.75.75.75
PING 75.75.75.75 (75.75.75.75): 56 data bytes
64 bytes from 75.75.75.75: icmp_seq=0 ttl=57 time=16.478 ms
64 bytes from 75.75.75.75: icmp_seq=1 ttl=57 time=21.042 ms
--- 75.75.75.75 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 16.478/18.760/21.042/2.282 ms
[~/] $ ping -c 2 75.75.76.76
PING 75.75.76.76 (75.75.76.76): 56 data bytes
64 bytes from 75.75.76.76: icmp_seq=0 ttl=54 time=33.904 ms
64 bytes from 75.75.76.76: icmp_seq=1 ttl=54 time=32.788 ms
--- 75.75.76.76 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 32.788/33.346/33.904/0.558 ms
[~/] $ ping6 -c 2 2001:558:feed::1
PING6(56=40+8+8 bytes) 2601:14d:8b00:7d0:6587:7cfc:e2cc:82a0 --> 2001:558:feed::1
16 bytes from 2001:558:feed::1, icmp_seq=0 hlim=57 time=14.927 ms
16 bytes from 2001:558:feed::1, icmp_seq=1 hlim=57 time=14.585 ms
--- 2001:558:feed::1 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 14.585/14.756/14.927/0.171 ms
[~/] $ ping6 -c 2 2001:558:feed::2
PING6(56=40+8+8 bytes) 2601:14d:8b00:7d0:6587:7cfc:e2cc:82a0 --> 2001:558:feed::2
16 bytes from 2001:558:feed::2, icmp_seq=0 hlim=54 time=12.694 ms
16 bytes from 2001:558:feed::2, icmp_seq=1 hlim=54 time=11.555 ms
--- 2001:558:feed::2 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 11.555/12.125/12.694/0.569 ms
最佳答案
经过进一步调查,这个问题似乎不是由 aiohttp
/asyncio
直接引起的,而是两者的限制/限制:
首先,对于那些希望获得一些增强型 DNS 服务器的人(我可能不会走那条路),大牌选项似乎是:
(Good intro to DNS 适合像我这样缺乏网络概念的人。)
我做的第一件事是在增强的 AWS EC2 实例上运行上面的内容 - h1.16xlarge 运行 Ubuntu,它是 IO 优化的。我不能说这本身有帮助,但肯定不会造成伤害。我不太熟悉 EC2 实例使用的默认 DNS 服务器,但是在复制上述脚本时,上面带有 errno == 8 的 OSError 消失了。
但是,它出现了一个新的异常,代码为 24 的 OSError,“打开的文件太多”。我的修补程序解决方案(不是说这是最可持续或最安全的)是增加最大文件限制。我这样做是通过:
sudo vim /etc/security/limits.conf
# Add these lines
root soft nofile 100000
root hard nofile 100000
ubuntu soft nofile 100000
ubuntu hard nofile 100000
sudo vim /etc/sysctl.conf
# Add this line
fs.file-max = 2097152
sudo sysctl -p
sudo vim /etc/pam.d/commmon_session
# Add this line
session required pam_limits.so
sudo reboot
诚然,我感觉自己在黑暗中徘徊,但将其与 asyncio.Semaphore(1024)
(示例 here)结合使用,导致上述两个异常中的 0 个异常被引发:
# Then call this from bulk_get with asyncio.Sempahore(n)
async def bounded_get(sem, url, session) -> str:
async with sem:
return await get(url, session)
在约 25000 个输入 URL 中,只有约 100 个 GET 请求返回异常,主要是因为这些网站被合法破坏,完成的总时间在几分钟内,我认为可以接受。
关于python - aiohttp并发GET请求导致ClientConnectorError(8, 'nodename nor servname provided, or not known'),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54209525/
016-03-07T09:10:16.992-0600 W NETWORK [HostnameCanonicalizationWorker] Failed to obtain name info f
我有一个工作的 C 程序,其中字符串数组的长度在编译时已知。它是: char array_person_name[3][101]; char person_name[101] = ""; ... st
我正在做一个需要Graph DB 的项目。我正在使用 C# .Net Core 开发项目。我不得不选择 ArangoDB 作为这个项目的 Graph DB。不幸的是,.Net 没有官方驱动程序。这就是
我正在尝试使用 DataContractSerializer 将对象序列化为 Xml。我有以下类(class); [ActiveRecord(Lazy = true)] [KnownType(type
我有: 身份服务器 4, 具有 OpenId Connect 和混合流的 Mvc 应用 WebApi 应用 假设用户已经获得带有 id_token 和访问 token 的 cookie。然后他从 mv
我有: 身份服务器 4, 具有 OpenId Connect 和混合流的 Mvc 应用 WebApi 应用 假设用户已经获得带有 id_token 和访问 token 的 cookie。然后他从 mv
我有一个简单的 Asp.Net Core Azure Web 应用程序,需要向本地 Rest 服务发出 http get 请求。此 Rest 服务托管在 IIS 上,仅针对端口 443 设置了绑定(b
这个问题可能是一个很好回答的问题,但不幸的是我不知道正确的术语来正确地问它,所以...... template class __bit_iterator; 有人可以在这里解释最后一个模板参数吗?我唯
在基于ASP.NET Core 1.1.1开发的VS2017 Ver 15.3.3应用程序中,我使用Account confirmation and password recovery in ASP.
我有一个应用程序(aspnet core app 3.1),在启动期间使用以下代码从 azure blob 存储加载数据: BlobClient client = new BlobClient(loa
我有一个应用程序(aspnet core app 3.1),在启动期间使用以下代码从 azure blob 存储加载数据: BlobClient client = new BlobClient(loa
我想知道如何查询Wikidata通过使用别名(“也称为”)。 现在我正在努力 SELECT ?item WHERE { ?item rdfs:aliases ?alias. FILTER(CONTAI
这是一个 PNG 类,在类文档中列出了两个构造函数,如下所示。 PNG::PNG ( string const & file_name ) Creates a PNG image
这就是我目前拥有的,以及我的教授提供的扫描仪。 #include "Similarity.h" #include "Scanner.h" using namespace std; int Simila
我想从我的 asp 页面连接到 mysql 数据库。因此,根据我的托管服务提供商的说法,我使用了连接字符串,其中服务器被指定为“localhost:3309” Dim myConnection As
我正在尝试集成的第三方应用程序,要求将文件放入 .well-known文件夹。如何使该文件可从 URL 访问? ( example.com/.well-known/token.txt )。站点部署为
尝试学习 Hibernate,我正在尝试学习如何执行 NamedQuries但每次我都会收到 Exception in thread "main" org.hibernate.MappingExcep
我正在尝试将 PWA 发布到 Google Playstore。我被困在数字 Assets 握手中。 这是我的 nginx conf - location /asd/ { default_t
我正在尝试设置 wordpress xml-rpc带 rails : blog = XMLRPC::Client.new("localhost/blog", "/xmlrpc.php", 80) 但是
我在我的 nginx 配置中有这个: location ~ /\. { deny all; } location /.well-known/ { allow all; } 但是我还是不
我是一名优秀的程序员,十分优秀!