gpt4 book ai didi

python - 使用多重处理来压缩大量文件

转载 作者:行者123 更新时间:2023-12-01 04:39:32 25 4
gpt4 key购买 nike

我正在尝试使用 python 多处理模块压缩大约 95 个文件,每个文件大小为 7 GB:

import os;
from shutil import copyfileobj;
import bz2;
import multiprocessing as mp
import pprint
from numpy.core.test_rational import numerator

''' Input / Output Path '''

ipath = 'E:/AutoConfirm/'
opath = 'E:/compressed-autoconfirm/'

''' Number of Processes '''
num_of_proc = 6

def compressFile(fileName,chunkSize=100000000):
global ipath
print 'Started Compressing %s to %s'%(fileName,opath)
inp = open(ipath+fileName,'rb')
output = bz2.BZ2File(opath+fileName.split('/')[-1].strip('.csv')+'.bz2','wb',compresslevel=9)
copyfileobj(inp,output,chunkSize)
print 'Finished Compressing %s to %s'%(fileName,opath)

def process_worker(fileList):
for x in fileList:
compressFile(x)

def split_list(tempList):
a , reList = 0, []
global num_of_proc
for x in range(num_of_proc+1):
reList.append([tempList[a:a+len(tempList)/num_of_proc]])
a = a + len(tempList)/num_of_proc
return reList

pool = mp.Pool(processes=num_of_proc)
''' Prepare a list of all the file names '''
tempList = [x for x in os.listdir(ipath)]

''' Split the list into sub-lists
For example : if I have 90 files and I am using 6 processes
each of the process will work on 15 files each '''

iterList = split_list(tempList)

''' print iterList >> [ [filename1, filename2] , [filename3,filename4], ... ] '''


''' Pass the list consisting of sub-lists to pool '''
pool.map(process_worker,iterList)

上面的代码最终创建了 90 个进程,而不是 6 个。任何人都可以帮我找出代码中的缺陷吗?

最佳答案

多重处理将重新导入模块,因此,由于一切都是顶级的,因此它会一次又一次地执行所有操作。

您需要将代码放入函数中并调用它。

def main():
...

if __name__ == '__main__':
main()

关于python - 使用多重处理来压缩大量文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31039218/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com