gpt4 book ai didi

python - 在 BeautifulSoup 中将 替换为 href

转载 作者:太空宇宙 更新时间:2023-11-03 17:02:03 26 4
gpt4 key购买 nike

content='<p>Hello, the web site is <a href="https://www.google.com">Google</a></p>. <p>The search engine is <a href="https://www.baidu.com">Baidu</a></p>.'
soup = BeautifulSoup(content, 'html.parser')

现在我想替换整个<a> </a>以及href中的url地址。所以我想得到预期的结果:

Hello, the web site is https://www.google.com. The search engine is https://www.baidu.com.

谁能提供解决方案吗?

最佳答案

首先找到a并获取href,然后您可以将href添加到前一个兄弟并删除a

from bs4 import BeautifulSoup

content='<p>Hello, the web site is <a href="https://www.google.com">Google</a></p>. <p>The search engine is <a href="https://www.baidu.com">Baidu</a></p>.'
soup = BeautifulSoup(content, 'html.parser')

# find all `a`
all_a = soup.findAll('a')

for a in all_a:
# find `href` in `a`
href = a['href']

#print('--- before ---')
#print(soup)

# add `href` to `previousSibling`
a.previousSibling.replaceWith(a.previousSibling + href)

# remove `a`
a.extract()

#print('--- after ---')
#print(soup)

print(soup)

'<p>Hello, the web site is https://www.google.com</p>. <p>The search engine is https://www.baidu.com</p>.'

关于python - 在 BeautifulSoup 中将 <an> </a> 替换为 href,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34938322/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com