gpt4 book ai didi

python - 使用gevent下载图片

转载 作者:太空狗 更新时间:2023-10-30 00:16:54 24 4
gpt4 key购买 nike

我的任务是从给定的 url 列表下载 1M+ 图像。推荐的方法是什么?

阅读后Greenlet Vs. Threads我查看了 gevent,但我无法让它可靠地运行。我玩了一个包含 100 个 url 的测试集,有时它在 1.5 秒内完成,但有时它需要超过 30 秒,这很奇怪,因为每个请求的超时 * 是 0.1,所以它永远不会超过 10 秒。

*见下面的代码

我也调查了 grequests 但他们似乎有 issues with exception handling.

我的“要求”是我可以

  • 检查下载时出现的错误(超时、损坏的图像...),
  • 监控处理图片数量的进度
  • 尽可能快。
from gevent import monkey; monkey.patch_all()
from time import time
import requests
from PIL import Image
import cStringIO
import gevent.hub
POOL_SIZE = 300


def download_image_wrapper(task):
return download_image(task[0], task[1])

def download_image(image_url, download_path):
raw_binary_request = requests.get(image_url, timeout=0.1).content
image = Image.open(cStringIO.StringIO(raw_binary_request))
image.save(download_path)

def download_images_gevent_spawn(list_of_image_urls, base_folder):
download_paths = ['/'.join([base_folder, url.split('/')[-1]])
for url in list_of_image_urls]
parameters = [[image_url, download_path] for image_url, download_path in
zip(list_of_image_urls, download_paths)]
tasks = [gevent.spawn(download_image_wrapper, parameter_tuple) for parameter_tuple in parameters]
for task in tasks:
try:
task.get()
except Exception:
print 'x',
continue
print '.',

test_urls = # list of 100 urls

t1 = time()
download_images_gevent_spawn(test_urls, 'download_temp')
print time() - t1

最佳答案

我认为坚持使用 urllib2 会更好,以 https://github.com/gevent/gevent/blob/master/examples/concurrent_download.py#L1 为例

试试这段代码,我想这就是你要问的。

import gevent
from gevent import monkey

# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()

import sys

urls = sorted(chloya_files)

if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
from urllib2 import urlopen


def download_file(url):
data = urlopen(url).read()
img_name = url.split('/')[-1]
with open('c:/temp/img/'+img_name, 'wb') as f:
f.write(data)
return True


from time import time

t1 = time()
tasks = [gevent.spawn(download_file, url) for url in urls]
gevent.joinall(tasks, timeout = 12.0)
print "Sucessful: %s from %s" % (sum(1 if task.value else 0 for task in tasks), len(tasks))
print time() - t1

关于python - 使用gevent下载图片,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33528959/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com