gpt4 book ai didi

css - bs4 可以解析带有 <style ='display:none;' > 的标签吗?

转载 作者:行者123 更新时间:2023-11-28 01:12:27 26 4
gpt4 key购买 nike

如果您浏览此页面 https://weathernews.jp/s/topics/201808/220015/?fm=tp_index,当我将其解析为代码时,您将看到两张图片:

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from urllib.parse import urljoin
import re

options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://weathernews.jp/s/topics/201808/220015/?fm=tp_index')
soup_level2 = BeautifulSoup(driver.page_source, 'lxml')

sections = soup_level2.find_all("img")

for section in sections:
image = re.findall(r"(https://smtgvs.weathernews.jp/s/topics/img/[0-9]+/.+)\?[0-9]+", urljoin('https://weathernews.jp/', section['src']))

if image:
print(image[0])
else:
image = re.findall(r"(https://smtgvs.weathernews.jp/s/topics/img/[0-9]+/.+)\?[0-9]+", urljoin('https://weathernews.jp/', section.get("data-original")))
if image:
print(image[0])

我得到如下图

https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_top_img_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img0_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img1_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img2_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img5_A.png

其实页面上还有另外两张style="display: none;"的图片,你能帮我解析一下吗?

<section id="box3" class="nodisp_zero" style="display: none;">
<h1 id="box_ttl3" style="display: none;"></h1>
<img style="width: 100%; display: none;" id="box_img3" alt="box3" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img3_A.jpg?1533975785">
<figcaption id="box_caption3" style="display: none;"></figcaption>
<div class="textarea clearfix">
<h2 id="box_subttl3" style="display: none;"></h2>
<div class="fontL" id="box_com3" style="display: none;"></div>
</div>
</section>

最佳答案

您可以使用属性查询 html。

例如:

html = """<section id="box3" class="nodisp_zero" style="display: none;">
<h1 id="box_ttl3" style="display: none;"></h1>
<img style="width: 100%; display: none;" id="box_img3" alt="box3" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img3_A.jpg?1533975785">
<figcaption id="box_caption3" style="display: none;"></figcaption>
<div class="textarea clearfix">
<h2 id="box_subttl3" style="display: none;"></h2>
<div class="fontL" id="box_com3" style="display: none;"></div>
</div>
</section>"""


from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")

print( soup.find("section", {"style": "display: none;"}).img["data-original"] )

输出:

https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img3_A.jpg?1533975785

关于css - bs4 可以解析带有 &lt;style ='display:none;' > 的标签吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52032024/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com