gpt4 book ai didi

python - 多处理 Python 中的写入错误

转载 作者:太空宇宙 更新时间:2023-11-03 17:40:42 25 4
gpt4 key购买 nike

我正在尝试使用多处理代码 python (2.7) 编辑后写入某些文件。它对于小数量(<20)来说就像一个魅力。但当我尝试获取更多文件(20+)时,它变得疯狂。我在具有 4 核处理器的 CentOS 6.5 上使用 Python 2.7.5。

import sys, os
import multiprocessing

import glob
list_files = glob.glob("Protein/*.txt")

def Some_func(some_file):
with open(some_file) as some:
with open(file_output) as output:
for lines in Some:
#Do Something
#edited_lines = func(lines)
output.write(edited_lines)


pool = multiprocessing.Pool(10) # Desired number of threads = 10
pool.map(Some_func, list_files,)
pool.close()
pool.join()

最终编写的文件相互重叠。

File 1
Lines 1 .. File 1
Lines 2 .. File 1
Lines 3 .. File 1
Lines 4 .. File 1
Lines 5 .. File 1
Lines 6 .. File 1
Lines 7 .. File 1
Lines 8 .. File 1
Lines 9 .. File 1

File 2
Lines 1 .. File 2
Lines 2 .. File 2
Lines 3 .. File 2
Lines 4 .. File 2
Lines 5 .. File 2
Lines 6 .. File 2
Lines 7 .. File 2
Lines 8 .. File 2
Lines 9 .. File 2



Output:

Lines 1 .. File 1
Lines 2 .. File 1
Lines 3 .. File 1 Lines 1 .. File 2
Lines 4 .. File 1
Lines 5 .. File 1Lines 2 .. File 2
Lines 3 .. File 2
Lines 4 .. File 2
Lines 6 .. File 1

最佳答案

问题是您尝试从多个进程并行写入文件,但这些进程不同步。这意味着不同的进程可能会尝试同时写入,从而导致您看到的奇怪现象。

您可以通过使用单个写入进程(每个工作进程发送要写入该单个进程的行)或使用 multiprocessing.Lock 同步每个进程完成的写入来解决此问题。

使用单个编写器:

import glob
import multiprocessing
from functools import partial
from threading import Thread

list_files = glob.glob("Protein/*.txt")

def Some_func(out_q, some_file):
with open(some_file) as some:
for lines in Some:
#Do Something
#edited_lines = func(lines)

out_q.put(edited_lines)

def write_lines(q):
with open(file_output) as output:
for line in iter(q.get, None): # This will end when None is received
output.write(line)

pool = multiprocessing.Pool(10) # Desired number of threads = 10
m = multiprocessing.Manager()
q = m.Queue()
t = Thread(target=write_lines, args=(q,))
t.start()
pool.map(partial(Some_func, q), list_files)
pool.close()
pool.join()
q.put(None) # Shut down the writer thread
t.join()

使用multiprocessing.Lock:

import glob
import multiprocessing
from functools import partial

list_files = glob.glob("Protein/*.txt")

def Some_func(lock, some_file):
with open(some_file) as some:
with open(file_output) as output:
for lines in Some:
#Do Something
#edited_lines = func(lines)
with lock:
output.write(edited_lines)


pool = multiprocessing.Pool(10) # Desired number of threads = 10
m = multiprocessing.Manager()
lock = m.Lock()
pool.map(partial(Some_func, lock), list_files)
pool.close()
pool.join()

我们需要使用Manager来创建共享对象,因为您要将它们传递到Pool,这需要对它们进行pickle。普通 multiprocessing.Lock/multiprocessing.Queue 对象只能传递给 multiprocessing.Process 构造函数,并且在传递给Pool 方法,如 map

关于python - 多处理 Python 中的写入错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30601261/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com