gpt4 book ai didi

python - 使用 BeautifulSoup (4.9.0) 提取脚本内容

转载 作者:行者123 更新时间:2023-12-03 16:49:16 24 4
gpt4 key购买 nike

从 4.9.0 版本开始 BeautifulSoup4 改变了[0]方式 text prop 有效,现在忽略嵌入脚本的内容:

= 4.9.0 (20200405)
...
* Embedded CSS and Javascript is now stored in distinct Stylesheet and
Script tags, which are ignored by methods like get_text() since most
people don't consider this sort of content to be 'text'. This
feature is not supported by the html5lib treebuilder. [bug=1868861]

所以现在不能再提取 wanted text超出 html <script>wanted text</script>使用 soup.find('script').text .

现在提取它的首选方法是什么?我宁愿不删除 <script></script>来自 str(script)用手。

[0] - https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/CHANGELOG

最佳答案

您可以尝试使用脚本标签的 contents 如下:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.yourwebsite.com")
soup = BeautifulSoup(r.content, "html.parser")

for script in soup.find_all('script'):
if len(script.contents):
print(script.contents[0])

关于python - 使用 BeautifulSoup (4.9.0) 提取脚本内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61122589/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com