gpt4 book ai didi

python - 如何在html中使用正则表达式或其他方式在python中删除

标签下的属性?

转载 作者:行者123 更新时间:2023-11-28 18:57:57 24 4
gpt4 key购买 nike

我用这段代码来保存 <p><br>字符串中的标记。

from bs4 import BeautifulSoup

mystring = 'aaa<p>Radio and<BR> television.<br></p><p align="right">very<br/> popular in the world today.</p><p class="myclass">Millions of people watch TV. </p><p>That’s because a radio is very small <span style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span style=":_black;">haha100%</span></p>bb'
soup = BeautifulSoup(mystring,'html.parser')
for e in soup.find_all():
if e.name not in ['p','br']:
e.unwrap()
print(str(soup))

结果是:

aaa<p>Radio and<br/> television.<br/></p><p align="right">very<br> popular in the world today.</br></p><p class="myclass">Millions of people watch TV. </p><p>That’s because a radio is very small 98.2%</p><p>and it‘s easy to carry. haha100%</p>bb

但是我发现<p>下面还有一些属性标签。例如,对齐和分类。事实上,我想删除align="right"class="myclass"<p> 中的其他属性标签,保留<p>即可标签。

我想要这样的结果:

aaa<p>Radio and<br/> television.<br/></p><p>very<br> popular in the world today.</br></p><p>Millions of people watch TV. </p><p>That’s because a radio is very small 98.2%</p><p>and it‘s easy to carry. haha100%</p>bb

我要删除<p>下的属性标签。

如何做到这一点?

最佳答案

你的意思是:

for e in soup.find_all():
if e.name not in ['p','br']:
e.unwrap()
else:
e.attrs={}
print(str(soup))

关于python - 如何在html中使用正则表达式或其他方式在python中删除<p>标签下的属性?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56014856/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com