gpt4 book ai didi

python - 在Python中使用多重处理处理大文件: How to load resources only once per process?

转载 作者:太空宇宙 更新时间:2023-11-03 20:39:47 25 4
gpt4 key购买 nike

Python的multiprocessing.Pool.imap非常方便地逐行处理大文件:

import multiprocessing

def process(line):
processor = Processor('some-big.model') # this takes time to load...
return processor.process(line)

if __name__ == '__main__':
pool = multiprocessing.Pool(4)
with open('lines.txt') as infile, open('processed-lines.txt', 'w') as outfile:
for processed_line in pool.imap(process, infile):
outfile.write(processed_line)

如何确保上例中的帮助器(例如 Processor)仅加载一次?在不诉诸涉及队列的更复杂/冗长的结构的情况下,这是否可能?

最佳答案

multiprocessing.Pool 允许通过 initializerinitarg 参数进行资源初始化。我很惊讶地发现这个想法是利用全局变量,如下所示:

import multiprocessing as mp

def init_process(model):
global processor
processor = Processor(model) # this takes time to load...

def process(line):
return processor.process(line) # via global variable `processor` defined in `init_process`

if __name__ == '__main__':
pool = mp.Pool(4, initializer=init_process, initargs=['some-big.model'])
with open('lines.txt') as infile, open('processed-lines.txt', 'w') as outfile:
for processed_line in pool.imap(process, infile):
outfile.write(processed_line)

multiprocessing.Pool's documentation 中没有很好地描述这个概念。 ,所以我希望这个例子对其他人有帮助。

关于python - 在Python中使用多重处理处理大文件: How to load resources only once per process?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56931989/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com