gpt4 book ai didi

python - 碎片 : create folder structure out of downloaded images based on the url from which images are being downloaded

转载 作者:行者123 更新时间:2023-11-28 18:51:12 24 4
gpt4 key购买 nike

我有一组定义网站结构的链接。从这些链接下载图片时,我想同时将下载的图片放在一个类似于网站结构的文件夹结构中,而不是仅仅重命名它(如 Scrapy image download how to use custom filename 中的回答)

我的代码是这样的:

class MyImagesPipeline(ImagesPipeline):
"""Custom image pipeline to rename images as they are being downloaded"""
page_url=None
def image_key(self, url):
page_url=self.page_url
image_guid = url.split('/')[-1]
return '%s/%s/%s' % (page_url,image_guid.split('_')[0],image_guid)

def get_media_requests(self, item, info):
#http://store.abc.com/b/n/s/m
os.system('mkdir '+item['sku'][0].encode('ascii','ignore'))
self.page_url = urlparse(item['start_url']).path #I store the parent page's url in start_url Field
for image_url in item['image_urls']:
yield Request(image_url)

它创建了所需的文件夹结构,但是当我深入了解文件夹时,我发现文件在文件夹中放错了位置。

我怀疑它正在发生,因为“get_media_requests”和“image_key”函数可能异步执行,因此“page_url”的值在被“image_key”函数使用之前发生变化。

最佳答案

您完全正确,异步 Item 处理阻止通过管道内的 self 使用类变量。您必须在每个请求中存储您的路径并覆盖更多方法(未经测试):

def image_key(self, url, page_url):
image_guid = url.split('/')[-1]
return '%s/%s/%s' % (page_url, image_guid.split('_')[0], image_guid)

def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url, meta=dict(page_url=urlparse(item['start_url']).path))

def get_images(self, response, request, info):
key = self.image_key(request.url, request.meta.get('page_url'))
...

def media_to_download(self, request, info):
...
key = self.image_key(request.url, request.meta.get('page_url'))
...

def media_downloaded(self, response, request, info):
...
try:
key = self.image_key(request.url, request.meta.get('page_url'))
...

关于python - 碎片 : create folder structure out of downloaded images based on the url from which images are being downloaded,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12956653/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com