gpt4 book ai didi

python - 通过多处理传递大量数据

转载 作者:太空宇宙 更新时间:2023-11-04 04:48:52 24 4
gpt4 key购买 nike

我正在尝试弄清楚如何编写一个并行执行计算的程序,这样每次计算的结果都可以按特定顺序写入文件。我的问题是尺寸;我想做我在下面的示例程序中概述的事情——将大输出保存为字典的值,该字典将排序系统存储在其键中。但是我的程序一直在中断,因为它无法存储/传递这么多字节。

是否有解决此类问题的固定方法?我是处理多处理和大数据的新手。

from multiprocessing import Process, Manager

def eachProcess(i, d):
LARGE_BINARY_OBJECT = #perform some computation resulting in millions of bytes
d[i] = LARGE_BINARY_OBJECT
def main():
manager = Manager()
d = manager.dict()
maxProcesses = 10
for i in range(maxProcesses):
process = Process(target=eachProcess, args=(i,d))
process.start()

counter = 0
while counter < maxProcesses:
file1 = open("test.txt", "wb")
if counter in d:
file1.write(d[counter])
counter += 1

if __name__ == '__main__':
main()

谢谢。

最佳答案

处理大数据时通常有两种方法:

  1. 本地文件系统,如果问题足够简单
  2. 远程数据存储(如果需要更复杂的数据支持)

由于您的问题看起来很简单,我建议采用以下解决方案。每个进程将其部分解决方案写入本地文件。完成所有处理后,主进程将所有结果文件组合在一起。

from multiprocessing import Pool
from tempfile import NamedTemporaryFile

def worker_function(partial_result_path):
data = produce_large_binary()
with open(partial_result_path, 'wb') as partial_result_file:
partial_result_file.write(data)

# storing partial results in temporary files
partial_result_paths = [NamedTemporaryFile() for i in range(max_processes)]

pool = Pool(max_processes)
pool.map(worker_function, partial_result_paths)

with open('test.txt', 'wb') as result_file:
for partial_result_path in partial_result_paths:
with open(partial_result_path) as partial_result_file:
result_file.write(partial_result_file.read())

关于python - 通过多处理传递大量数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48860553/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com