gpt4 book ai didi

python - 如何根据动态文件夹下载scrapy图片

转载 作者:太空宇宙 更新时间:2023-11-03 12:59:26 26 4
gpt4 key购买 nike

我正在尝试覆盖默认路径 full/hash.jpg<dynamic>/hash.jpg , 我试过了 How to download scrapy images in a dyanmic folder使用以下代码:

def item_completed(self, results, item, info):

for result in [x for ok, x in results if ok]:
path = result['path']
# here we create the session-path where the files should be in the end
# you'll have to change this path creation depending on your needs
slug = slugify(item['category'])
target_path = os.path.join(slug, os.path.basename(path))

# try to move the file and raise exception if not possible
if not os.rename(path, target_path):
raise DropItem("Could not move image to target folder")

if self.IMAGES_RESULT_FIELD in item.fields:
item[self.IMAGES_RESULT_FIELD] = [x for ok, x in results if ok]
return item

但我得到:

Traceback (most recent call last):
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 839, in _cbDeferred
self.callback(self.resultList)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 382, in callback
self._startRunCallbacks(result)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/user/Projects/sepid/scraper/scraper/pipelines.py", line 44, in item_completed
if not os.rename(path, target_path):
exceptions.OSError: [Errno 2] No such file or directory

我不知道哪里出了问题,还有其他方法可以改变路径吗?谢谢

最佳答案

我创建了一个继承自 ImagesPipeline 的管道并覆盖了 file_path 方法,并使用它代替了标准的 ImagesPipeline

class StoreImgPipeline(ImagesPipeline):
def file_path(self, request, response=None, info=None):
image_guid = hashlib.sha1(to_bytes(request.url)).hexdigest()
return 'realty-sc/%s/%s/%s/%s.jpg' % (YEAR, image_guid[:2], image_guid[2:4], image_guid)

关于python - 如何根据动态文件夹下载scrapy图片,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28007995/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com