gpt4 book ai didi

python - BeautifulSoup:RuntimeError:超出最大递归深度

转载 作者:太空狗 更新时间:2023-10-30 00:40:05 28 4
gpt4 key购买 nike

我无法使用 BeautifulSoup 避免最大递归深度 Python RuntimeError。

我正在尝试递归嵌套的代码部分并提取内容。美化后的 HTML 看起来像这样(不要问为什么它看起来像这样 :)):

<div><code><code><code><code>Code in here</code></code></code></code></div>

我将 soup 对象传递给的函数是:

def _strip_descendent_code(self, soup):
sys.setrecursionlimit(2000)
# soup = BeautifulSoup(html, 'lxml')
for code in soup.findAll('code'):
s = ""
for c in code.descendents:
if not isinstance(c, NavigableString):
if c.name != code.name:
continue
elif c.name == code.name:
if isinstance(c, NavigableString):
s += str(c)
else:
continue
code.append(s)
return str(soup)

您可以看到我正在尝试增加默认递归限制,但这不是解决方案。我已经增加到 C 达到计算机内存限制的程度,并且上面的功能永远不会起作用。

如果您能提供任何帮助使其正常工作并指出错误,我们将不胜感激。

堆栈跟踪重复这个:

  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
i = next(generator)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
stopNode = self._last_descendant().next_element
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
if is_initialized and self.next_sibling:
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
return self.find(tag)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
i = next(generator)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
stopNode = self._last_descendant().next_element
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
if is_initialized and self.next_sibling:
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
return self.find(tag)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 512, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1548, in __init__
self.text = self._normalize_search_value(text)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1553, in _normalize_search_value
if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
RuntimeError: maximum recursion depth exceeded while calling a Python object

最佳答案

我遇到过这个问题,浏览了很多网页。我总结了两种方法来解决这个问题。

但是,我认为我们应该知道为什么会这样。 Python 限制递归的数量(默认数量为 1000)。我们可以使用 print sys.getrecursionlimit() 查看这个数字。我猜 BeautifulSoup 使用递归来查找子元素。当递归超过1000次时,会出现RuntimeError: maximum recursion depth exceeded

第一种方法:使用sys.setrecursionlimit()设置递归次数限制。您显然可以设置 1000000,但可能会导致 segmentation fault

第二种方法:使用try-except。如果出现maximum recursion depth exceeded,我们的算法可能有问题。一般来说,我们可以用循环代替递归。在你的问题中,我们可以预先用 replace() 或正则表达式处理 HTML。

最后,我举个例子。

from bs4 import BeautifulSoup
import sys
#sys.setrecursionlimit(10000)

try:
doc = ''.join(['<br>' for x in range(1000)])
soup = BeautifulSoup(doc, 'html.parser')
a = soup.find('br')
for i in a:
print i
except:
print 'failed'

如果删除#,它可以打印doc

希望能帮到你。

关于python - BeautifulSoup:RuntimeError:超出最大递归深度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31528600/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com