gpt4 book ai didi

python - 在生成器中捕获错误并在之后继续

转载 作者:行者123 更新时间:2023-12-04 05:21:34 26 4
gpt4 key购买 nike

我有一个应该运行几天的迭代器。我希望捕获并报告错误,然后我希望迭代器继续。或者整个过程可以重新开始。

这是功能:

def get_units(self, scraper):
units = scraper.get_units()
i = 0
while True:
try:
unit = units.next()
except StopIteration:
if i == 0:
log.error("Scraper returned 0 units", {'scraper': scraper})
break
except:
traceback.print_exc()
log.warning("Exception occurred in get_units", extra={'scraper': scraper, 'iteration': i})
else:
yield unit
i += 1

因为 scraper可能是许多代码变体之一,它不能被信任,我不想处理那里的错误。

但是当 units.next()出现错误时,整个事情都停止了。我怀疑是因为迭代器抛出了 StopIteration当其中一个迭代失败时。

这是输出(只有最后几行)
[2012-11-29 14:11:12 /home/amcat/amcat/scraping/scraper.py:135 DEBUG] Scraping unit <Element div at 0x4258c710>
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article Counter-Strike: Global Offensive Update Released
Traceback (most recent call last):
File "/home/amcat/amcat/scraping/controller.py", line 101, in get_units
unit = units.next()
File "/home/amcat/amcat/scraping/scraper.py", line 114, in get_units
for unit in self._get_units():
File "/home/amcat/scraping/games/steamcommunity.py", line 90, in _get_units
app_doc = self.getdoc(url,urlencode(form))
File "/home/amcat/amcat/scraping/scraper.py", line 231, in getdoc
return self.opener.getdoc(url, encoding)
File "/home/amcat/amcat/scraping/htmltools.py", line 54, in getdoc
response = self.opener.open(url, encoding)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error
[2012-11-29 14:11:14 /home/amcat/amcat/scraping/controller.py:110 WARNING] Exception occurred in get_units

...code ends...

那么,当错误发生时,如何防止迭代停止呢?

编辑:这是 get_units() 中的代码
def get_units(self):
"""
Split the scraping job into a number of 'units' that can be processed independently
of each other.

@return: a sequence of arbitrary objects to be passed to scrape_unit
"""
self._initialize()
for unit in self._get_units():
yield unit

这是一个简化的 _get_units():
INDEX_URL = "http://www.steamcommunity.com"

def _get_units(self):
doc = self.getdoc(INDEX_URL) #returns a lxml.etree document

for a in doc.cssselect("div.discussion a"):
link = a.get('href')
yield link

编辑:问题跟进: Alter each for-loop in a function to have error handling executed automatically after each failed iteration

最佳答案

StopIterationnext() 提出当没有下一项时生成器的方法。它与生成器/迭代器内部的错误无关。

需要注意的另一件事是,根据迭代器的类型,它可能无法在异常后恢复。如果迭代器是一个带有 next 的对象方法,它会起作用的。但是,如果它实际上是生成器,则不会。

据我所知,这是 units.next() 出现错误后迭代无法继续的唯一原因。 . IE。 units.next()失败,下次你调用它时,它无法恢复,它说它是通过抛出 StopIteration 来完成的异常(exception)。

基本上你必须给我们看scraper.get_units()里面的代码。让我们了解为什么在单次迭代中出现错误后循环无法继续。如果 get_units()被实现为生成器函数,很明显。如果不是,则可能是其他原因阻止它恢复。

更新:解释什么是生成器函数:

class Scraper(object):
def get_units(self):
for i in some_stuff:
bla = do_some_processing()
bla *= 2 # random stuff
yield bla

现在,当您调用 Scraper().get_units() ,而不是运行整个函数,它返回一个生成器对象。调用 next()就可以了,将执行到第一个 yield .等等现在如果在 get_units 内的任何地方发生错误,它会被污染,可以这么说,下次你调用 next() , 它会提高 StopIteration ,就好像它已经用完了给你的东西一样。

阅读 http://www.dabeaz.com/generators/ (和 http://www.dabeaz.com/coroutines/ )强烈推荐。

更新2:一个可能的解决方案 https://gist.github.com/4175802

关于python - 在生成器中捕获错误并在之后继续,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13645112/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com