gpt4 book ai didi

python - 如何在python中压缩一个非常大的文件

转载 作者:太空宇宙 更新时间:2023-11-03 11:51:17 25 4
gpt4 key购买 nike

我想使用 python 压缩几个可能达到 99 GB 左右的文件。请使用 zipfile 库执行此操作的最有效方法是什么。这是我的示例代码

with gcs.open(zip_file_name, 'w', content_type=b'application/zip') as f:

with zipfile.ZipFile(f, 'w') as z:

for file in files:

is_owner = (is_page_allowed_to_visitor(page, visitor) or (file.owner_id == visitor.id) )

if is_owner:
file.show = True
elif file.available_from:
if file.available_from > datetime.now():
file.show = False
elif file.available_to:
if file.available_to < datetime.now():
file.show = False
else:
file.show = True

if file.show:

file_name = "/%s/%s" % (gcs_store.get_bucket_name(), file.gcs_name)

gcs_reader = gcs.open(file_name, 'r')

z.writestr('%s-%s' %(file.created_on, file.name), gcs_reader.read() )

gcs_reader.close()

f.close() #closing zip file

一些注意事项:

1) 我正在使用 google app engine 来托管文件,所以我不能使用 zipfile.write() 方法。我只能以字节为单位获取文件内容。

提前致谢

最佳答案

我在zipfile 库中添加了一个新方法。这个增强的 zipfile 库是开源的,可以在 github (EnhancedZipFile) 上找到。我添加了一个新方法,其灵感来自 zipfile.write() 方法和 zipfile.writestr() 方法

def writebuffered(self, zinfo_or_arcname, file_pointer, file_size, compress_type=None):
if not isinstance(zinfo_or_arcname, ZipInfo):
zinfo = ZipInfo(filename=zinfo_or_arcname,
date_time=time.localtime(time.time())[:6])

zinfo.compress_type = self.compression
if zinfo.filename[-1] == '/':
zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x
zinfo.external_attr |= 0x10 # MS-DOS directory flag
else:
zinfo.external_attr = 0o600 << 16 # ?rw-------
else:
zinfo = zinfo_or_arcname

zinfo.file_size = file_size # Uncompressed size
zinfo.header_offset = self.fp.tell() # Start of header bytes
self._writecheck(zinfo)
self._didModify = True

fp = file_pointer
# Must overwrite CRC and sizes with correct data later
zinfo.CRC = CRC = 0
zinfo.compress_size = compress_size = 0
# Compressed size can be larger than uncompressed size
zip64 = self._allowZip64 and \
zinfo.file_size * 1.05 > ZIP64_LIMIT
self.fp.write(zinfo.FileHeader(zip64))
if zinfo.compress_type == ZIP_DEFLATED:
cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
zlib.DEFLATED, -15)
else:
cmpr = None
file_size = 0
while 1:
buf = fp.read(1024 * 8)
if not buf:
break
file_size = file_size + len(buf)
CRC = crc32(buf, CRC) & 0xffffffff
if cmpr:
buf = cmpr.compress(buf)
compress_size = compress_size + len(buf)
self.fp.write(buf)

if cmpr:
buf = cmpr.flush()
compress_size = compress_size + len(buf)
self.fp.write(buf)
zinfo.compress_size = compress_size
else:
zinfo.compress_size = file_size
zinfo.CRC = CRC
zinfo.file_size = file_size
if not zip64 and self._allowZip64:
if file_size > ZIP64_LIMIT:
raise RuntimeError('File size has increased during compressing')
if compress_size > ZIP64_LIMIT:
raise RuntimeError('Compressed size larger than uncompressed size')
# Seek backwards and write file header (which will now include
# correct CRC and file sizes)
position = self.fp.tell() # Preserve current position in file
self.fp.flush()
self.filelist.append(zinfo)
self.NameToInfo[zinfo.filename] = zinfo

注意事项

关于python - 如何在python中压缩一个非常大的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26849328/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com