gpt4 book ai didi

python - multiprocessing.Process导致: OSError: [Errno 12] Cannot allocate memory even when I run only 1 process

转载 作者:太空宇宙 更新时间:2023-11-03 21:10:39 31 4
gpt4 key购买 nike

我正在尝试在远程服务器 (AWS) 中处理非常大的文本文件 (~11 GB)。需要对文件进行的处理非常复杂,使用常规 python 程序,总运行时间约为 1 个月。为了减少运行时间,我尝试在一些进程之间划分文件工作。电脑规范: Computer specs

代码:

def initiate_workers(works, num_workers, output_path):
"""
:param works: Iterable of lists of strings (The work to be processed divided in num_workers pieces)
:param num_workers: Number of workers
:return: A list of Process objects where each object is ready to process its share.
"""
res = []
for i in range(num_workers):
# process_batch is the processing function
res.append(multiprocessing.Process(target=process_batch, args=(output_path + str(i), works[i])))
return res



def run_workers(workers):
"""
Run the workers and wait for them to finish
:param workers: Iterable of Process objects
"""
logging.info("Starting multiprocessing..")
for i in range(len(workers)):
workers[i].start()
logging.info("Started worker " + str(i))
for j in range(len(workers)):
workers[j].join()

我得到以下回溯:

Traceback (most recent call last):
File "w2v_process.py", line 93, in <module>
run_workers(workers)
File "w2v_process.py", line 58, in run_workers
workers[i].start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

无论num_workers=1、6还是14,它总是会崩溃。

我做错了什么?

谢谢!

编辑

发现问题了。我在 SO 的某个地方看到 fork(回溯的最后一行)实际上使 RAM 增加了一倍。在处理文件时,我将其加载到内存中,内存大约填充了 18GB,并且考虑到 RAM 的总容量为 30GB,确实存在内存分配错误。我将大文件分成较小的文件(工作人员的数量),并为每个 Process 对象提供该文件的路径。这样,每个进程都以懒惰的方式读取数据,一切都很好!

最佳答案

发现问题了。我在 SO 的某个地方看到 fork(回溯的最后一行)实际上使 RAM 增加了一倍。在处理文件时,我将其加载到内存中,内存大约填充了 18GB,并且考虑到 RAM 的总容量为 30GB,确实存在内存分配错误。我将大文件分成较小的文件(工作人员的数量),并为每个 Process 对象提供该文件的路径。这样,每个进程都以懒惰的方式读取数据,一切都很好!

关于python - multiprocessing.Process导致: OSError: [Errno 12] Cannot allocate memory even when I run only 1 process,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55087575/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com