gpt4 book ai didi

python - 使用 BeautifulSoup 提取 <script> 的内容

转载 作者:太空狗 更新时间:2023-10-29 20:37:08 25 4
gpt4 key购买 nike

1/我正在尝试使用漂亮的汤提取脚本的一部分,但它没有打印任何内容。怎么了?

URL = "http://www.reuters.com/video/2014/08/30/woman-who-drank-restaurants-tainted-tea?videoId=341712453"
oururl= urllib2.urlopen(URL).read()
soup = BeautifulSoup(oururl)

for script in soup("script"):
script.extract()

list_of_scripts = soup.findAll("script")
print list_of_scripts

2/目标是提取属性“transcript”的值:

<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "VideoObject",
"video": {
"@type": "VideoObject",
"headline": "Woman who drank restaurant&#039;s tainted tea hopes for industry...",
"caption": "Woman who drank restaurant&#039;s tainted tea hopes for industry...",
"transcript": "Jan Harding is speaking out for the first time about the ordeal that changed her life. SOUNDBITE: JAN HARDING, DRANK TAINTED TEA, SAYING: \"Immediately my whole mouth was on fire.\" The Utah woman was critically burned in her mouth and esophagus after taking a sip of sweet tea laced with a toxic cleaning solution at Dickey's BBQ. SOUNDBITE: JAN HARDING, DRANK TAINTED TEA, SAYING: \"It was like a fire beyond anything you can imagine. I mean, it was not like drinking hot coffee.\" Authorities say an employee mistakenly mixed the industrial cleaning solution containing lye into the tea thinking it was sugar. The Hardings hope the incident will bring changes in the restaurant industry to avoid such dangerous mixups. SOUNDBITE: JIM HARDING, HUSBAND, SAYING: \"Bottom line, so no one ever has to go through this again.\" The district attorney's office is expected to decide in the coming week whether criminal charges will be filed.",

最佳答案

来自documentation :

As of Beautiful Soup version 4.9.0, when lxml or html.parser are in use, the contents of <script>, <style>, and <template> tags are not considered to be ‘text’, since those tags are not part of the human-visible content of the page.

所以基本上上面 falsetru 接受的答案都很好,但是使用 .string而不是 .text使用更新版本的 Beautiful Soup,否则你会像我一样感到困惑 .text总是回来 None对于 <script>标签。

关于python - 使用 BeautifulSoup 提取 &lt;script&gt; 的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26192727/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com