gpt4 book ai didi

python - 使用 BeautifulSoup 下载图像

转载 作者:太空宇宙 更新时间:2023-11-04 05:14:29 24 4
gpt4 key购买 nike

我正在使用 BeautifulSoup 来提取适用于普通页面的图片。现在我想从这样的网页中提取 Chromebook 的图片

https://twitter.com/banprada/statuses/829102430017187841

该页面显然包含指向带有图像的另一页面的链接。这是我从提到的链接下载图像的代码,但我只得到发布链接的人的图像。

import urllib.request
import os
from bs4 import BeautifulSoup

URL = "http://twitter.com/banprada/statuses/829102430017187841"
list_dir="D:\\"
default_dir = os.path.join(list_dir,"Pictures_neu")
opener = urllib.request.build_opener()
urllib.request.install_opener(opener)
soup = BeautifulSoup(urllib.request.urlopen(URL).read())
imgs = soup.findAll("img",{"alt":True, "src":True})
for img in imgs:
img_url = img["src"]
filename = os.path.join(default_dir, img_url.split("/")[-1])
img_data = opener.open(img_url)
f = open(filename,"wb")
f.write(img_data.read())
f.close()

是否有机会以某种方式下载图像?

非常感谢和问候,安迪

最佳答案

这就是使用 Selenium 获取仅提及 图像的方式| + requests

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import requests

link = 'https://twitter.com/banprada/statuses/829102430017187841'
driver = webdriver.PhantomJS()
driver.get(link)
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[starts-with(@id, 'xdm_default')]")))
image_src = driver.find_element_by_tag_name('img').get_attribute('src')
response = requests.get(image_src).content
with open('C:\\Users\\You\\Desktop\\Image.jpeg', 'wb') as f:
f.write(response)

如果您想从页面上的所有 iframe 获取所有图像(不包括您可以使用代码获取的初始页面源上的图像):

from selenium import webdriver
from selenium.common.exceptions import WebDriverException
import requests
import time

link = 'https://twitter.com/banprada/statuses/829102430017187841'
driver = webdriver.Chrome()
driver.get(link)
time.sleep(5) # To wait until all iframes completely rendered. Might be increased
iframe_counter = 0
while True:
try:
driver.switch_to_frame(iframe_counter)
pictures = driver.find_elements_by_xpath('//img[@src and @alt]')
if len(pictures) > 0:
for pic in pictures:
response = requests.get(pic.get_attribute('src')).content
with open('C:\\Users\\You\\Desktop\\Images\\%s.jpeg' % (str(iframe_counter) + str(pictures.index(pic))), 'wb') as f:
f.write(response)
driver.switch_to_default_content()
iframe_counter += 1
except WebDriverException:
break

请注意,您可以使用 any webdriver

关于python - 使用 BeautifulSoup 下载图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42133073/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com