python - 如何防止引发 asyncio.TimeoutError 并继续循环-6ren

python - 如何防止引发 asyncio.TimeoutError 并继续循环

转载作者：行者123 更新时间：2023-12-01 08:51:57

57

4

我使用aiohttp和limited_as_completed方法来加速抓取(大约1亿个静态网站页面)。但是，代码会在几分钟后停止，并返回 TimeoutError。我尝试了几种方法，但仍然无法阻止引发 asyncio.TimeoutError 。请问如何忽略该错误并继续？

我正在运行的代码是:

N=123
import html
from lxml import etree
import requests
import asyncio 
import aiohttp
from aiohttp import ClientSession, TCPConnector
import pandas as pd
import re 
import csv 
import time
from itertools import islice
import sys
from contextlib import suppress

start = time.time()
data = {}
data['name'] = []
filename = "C:\\Users\\xxxx"+ str(N) + ".csv"

def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

async def get_info_byid(i, url, session):
    async with session.get(url,timeout=20) as resp:
        print(url)
        with suppress(asyncio.TimeoutError):
            r = await resp.text()
            name = etree.HTML(r).xpath('//h2[starts-with(text(),"Customer Name")]/text()')
            data['name'].append(name)
            dataframe = pd.DataFrame(data)
            dataframe.to_csv(filename, index=False, sep='|')

limit = 1000
async def print_when_done(tasks):
    for res in limited_as_completed(tasks, limit):
        await res

url = "http://xxx.{}.html"
loop = asyncio.get_event_loop()

async def main():
    connector = TCPConnector(limit=10)
    async with ClientSession(connector=connector,headers=headers,raise_for_status=False) as session:
        coros = (get_info_byid(i, url.format(i), session) for i in range(N,N+1000000))
        await print_when_done(coros)

loop.run_until_complete(main())
loop.close()
print("took", time.time() - start, "seconds.")

错误日志是:

Traceback (most recent call last):
  File "C:\Users\xxx.py", line 111, in <module>
    loop.run_until_complete(main())
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\asyncio\base_events.py", line 573, in run_until_complete
    return future.result()
  File "C:\Users\xxx.py", line 109, in main
    await print_when_done(coros)
  File "C:\Users\xxx.py", line 98, in print_when_done
    await res
  File "C:\Users\xxx.py", line 60, in first_to_finish
    return f.result()
  File "C:\Users\xxx.py", line 65, in get_info_byid
    async with session.get(url,timeout=20) as resp:
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\client.py", line 855, in __aenter__
    self._resp = await self._coro
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\client.py", line 391, in _request
    await resp.start(conn)
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\client_reqrep.py", line 770, in start
    self._continue = None
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\helpers.py", line 673, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

我已经尝试过了1)添加期望asyncio.TimeoutError:通过。不工作

async def get_info_byid(i, url, session):
    async with session.get(url,timeout=20) as resp:
        print(url)
        try:
            r = await resp.text()
            name = etree.HTML(r).xpath('//h2[starts-with(text(),"Customer Name")]/text()')
            data['name'].append(name)
            dataframe = pd.DataFrame(data)
            dataframe.to_csv(filename, index=False, sep='|')
        except asyncio.TimeoutError:
            pass

2) 抑制(asyncio.TimeoutError)，如上所示。不工作

我昨天刚刚学习了aiohttp，所以也许我的代码中有其他问题导致运行几分钟后才出现超时错误？如果有人知道如何处理，非常感谢!

最佳答案

@Yurii Kramarenko 所做的事情肯定会引发未关闭的客户端 session 异常，因为 session 从未正确关闭过。我推荐的是这样的:

import asyncio
import aiohttp

async def main(urls):
    async with aiohttp.ClientSession(timeout=self.timeout) as session:
        tasks=[self.do_something(session,url) for url in urls]
        await asyncio.gather(*tasks)

关于python - 如何防止引发 asyncio.TimeoutError 并继续循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53049523/

57

4

0

文章推荐： java - JAXB读取XML文档

文章推荐： R - 如何从日期列中选择最早的日期列？

文章推荐： titan - 如何在 TinkerPop 3 中找到所有没有传入边的顶点？

android - Volley : TimeOutError
我想从 URL 中获取一些数据，但服务器速度很慢...所以每当我使用 volley 获取数据时，我都会得到 TimeOurError，无论如何我可以处理多长时间 volley 应该尝试获取数据，因为服
python - asyncio.TimeoutError 永远不会引发
我想测试 asyncio 如何处理阻塞进程。我的代码一定有问题，因为 asyncio.TimeoutError 从未引发: import asyncio, random, time q = asyn
python - asyncio 在任务中捕获 TimeoutError
我有一个 asyncio.Task 需要在一段时间后取消。在取消之前，任务需要做一些清理。根据文档，我应该只能调用 task.cancel 或 asyncio.wait_for(coroutine,
python - 此超时错误 ('TimeOutError' ) 未被捕获
我不明白为什么有时我无法在我的 flash_serial_buffer 方法中捕捉到 TimeOutError。当运行我的程序时，我有时会收到一个未被捕获的 TimeOutError，我不明白为什么
python - 为什么会引发 asyncio.TimeoutError？
我正在执行 aiohttp.ClientSession 实例的 request()，有时会引发 asyncio.TimeoutError。我认为在这种情况下必须引发 aiohttp.ServerTim
python-3.x - 捕获异常 'TimeoutError'
我似乎无法捕捉到这个 TimeoutError 异常。在此找到的其他主题建议对 Python 3 使用“except TimeoutError”，但它仍然会引发错误。错误日志如下。我也尝试过导入 r
python - 如何捕获并发.futures._base.TimeoutError
我正在 try catch run_until_complete 内抛出的异常，但无论我尝试什么，我似乎都无法正确捕获它们。这是我的最新尝试(注意，我使用的是 Pypputeer，它是 Python
timeout - 提前捕获 Promise 的 TimeoutError
我有一个可以取消的 Bluebird promise 。当取消时，我必须做一些工作来巧妙地中止正在运行的任务。可以通过两种方式取消任务:通过 promise.cancel() 或 promise.ti
python - 如何防止引发 asyncio.TimeoutError 并继续循环
我使用aiohttp和limited_as_completed方法来加速抓取(大约1亿个静态网站页面)。但是，代码会在几分钟后停止，并返回 TimeoutError。我尝试了几种方法，但仍然无法阻止引
python - 使用 SqlAlchemy 获取 TimeoutError
更新:刚刚发现 echo_pool=True 只显示主要事件，而是使用 echo_pool="debug"。我必须明确关闭引擎连接，否则在使用 nosetest 和 sqlalchemy 时会出现
python - 为什么 asyncio 在没有任何消息的情况下引发 TimeoutError？
我在这段代码中遇到了一个小问题: try: return await asyncio.wait_for(tcp_command(cmd), timeout=timeout) except (O
python - 不可能捕获 asyncio.TimeoutError 吗？
我使用 asyncio 来获取 url，有时会超时，尽我所能，我无法使用以下代码捕获 asyncio.TimeoutError! async def fetch(url, session):
ruby - Rails 应用程序中的 Redis::TimeoutError
我的应用程序中不断出现 Redis::Timeout 错误(在 UI 和后台作业中)。我正在为 Redis 使用 AWS ElastiCache 服务。这就是我创建 Redis 连接的方式。在我的
mysql - Sequelize : TimeoutError: ResourceRequest timed out
在 Node 6.11 上使用 Sequelize 4.5.0 运行的我的 Express 应用程序有时会抛出 TimeoutError: ResourceRequest timed out，在不应该
python-3.x - aiohttp - 套接字传输上的致命读取错误 - TimeoutError
我们使用带有 Python3.5 的 aiohttp 将数据发布到 Elasticsearch 中。帖子行如下: response = await self._http_session.request
mysql - 当我在循环 TimeoutError : ResourceRequest timed out 内启动多个选择查询时出现错误
我正在使用 nodeJs Express 框架。我将 mysql 数据库与 sequelizejs 库一起使用，并使用查询来检索数据。当我为将近 50,00,000 条记录触发选择查询时，出现超时
Python - Matplotlib/matplotlib.cbook.TimeoutError : LOCKERROR
我在安装模块时遇到了这个问题 matplotlib并写了这段代码: import matplotlib.pyplot as plt 然后这里是错误: "matplotlib.cbook.Timeout
python - 如何在多处理中结合 TimeoutError 和 tqdm 进度条？
我想使用 TimeoutError 和 tqdm 进度条执行多重处理。我已经成功地分别尝试了它们。我该如何结合逻辑？目标: 进度条应随着每次 imap_unordered 调用而更新每个进程
ruby-on-rails - 从 Redis::TimeoutError 中恢复
我最近在 Heroku 上使用我的应用时遇到了这个错误。并发现了这一点: My::Application.config.session_store :redis_session_store, {
ruby-on-rails - Rails 抛出 Redis::TimeoutError
我曾两次尝试在 CentOS7 VM 上安装 RubyMine。由于 Redis::TimeoutError，它们现在都失败了。我在尝试启动 Rails 服务器时收到此错误。在我通过“service

首页

博学

6Ren·AI

商城

python - 如何防止引发 asyncio.TimeoutError 并继续循环