gpt4 book ai didi

python - BS4 replace_with 结果不再在树中

转载 作者:行者123 更新时间:2023-12-03 09:32:47 31 4
gpt4 key购买 nike

我需要替换 html 文档中的多个单词。 Atm 我通过为每次替换调用一次 replace_with 来做到这一点。在 NavigableString 上调用 replace_with 两次会导致 ValueError(见下面的例子),因为被替换的元素不再在树中。
最小的例子

#!/usr/bin/env python3
from bs4 import BeautifulSoup
import re
def test1():
html = \
'''
Identify
'''
soup = BeautifulSoup(html,features="html.parser")
for txt in soup.findAll(text=True):
if re.search('identify',txt,re.I) and txt.parent.name != 'a':
newtext = re.sub('identify', '<a href="test.html"> test </a>', txt.lower())
txt.replace_with(BeautifulSoup(newtext, features="html.parser"))
txt.replace_with(BeautifulSoup(newtext, features="html.parser"))
# I called it twice here to make the code as small as possible.
# Usually it would be a different newtext ..
# which was created using the replaced txt looking for a different word to replace.

return soup
print(test1())
预期结果:
The txt is == newstring
结果:
ValueError: Cannot replace one element with another when the element to be replaced is not
part of the tree.
一个简单的解决方案就是修改新字符串,最后只替换一次,但我想了解当前的现象。

最佳答案

第一txt.replace_with(...)删除 NavigableString (此处存储在来自文档树( doc )的变量 txt 中)。这有效地设置了 txt.parentNone第二个txt.replace_with(...)看着 parent属性,找到 None (因为 txt 已经从树中移除)并抛出 ValueError。
正如您在问题末尾所说,解决方案之一是使用 .replace_with()只有一次:

import re
from bs4 import BeautifulSoup

def test1():
html = \
'''
word1 word2 word3 word4
'''
soup = BeautifulSoup(html,features="html.parser")

to_delete = []
for txt in soup.findAll(text=True):
if re.search('word1', txt, flags=re.I) and txt.parent.name != 'a':
newtext = re.sub('word1', '<a href="test.html"> test1 </a>', txt.lower())

# ...some computations

newtext = re.sub('word3', '<a href="test.html"> test2 </a>', newtext)

# ...some more computations

# and at the end, replce txt only once:
txt.replace_with(BeautifulSoup(newtext, features="html.parser"))

return soup
print(test1())
打印:
<a href="test.html"> test1 </a> word2 <a href="test.html"> test2 </a> word4

关于python - BS4 replace_with 结果不再在树中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63424180/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com