gpt4 book ai didi

Python Multiprocessing : Pool. map() 似乎根本不调用函数

转载 作者:可可西里 更新时间:2023-11-01 10:42:58 25 4
gpt4 key购买 nike

我对多线程很陌生,所以如果它是基本的,我很抱歉。我有一些 OCRs 图像文件的功能,我想多线程任务。该函数不返回任何内容,仅保存 OCR 数据集的文本。代码如下:

start_time = time.time()
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
listfiles = os.listdir(path)

filterfiles = [p for p in listfiles if p[-4:] == '.tif']

pool = Pool(processes=2)

result = pool.map(OCRimage,filterfiles)

pool.close()
pool.join()

print("--- %s seconds ---" % (time.time() - start_time))

当我运行代码时,它似乎卡在了 pool.map() 上。我运行了 30 分钟,这比试用过程花费的时间长得多,而且它没有产生单一输出。我测试了我的 OCRimage 函数,它似乎并没有一次进入该函数(使用 print(1) 作为我的 OCRimage 代码的第一行)。我想知道是否有人可以帮助我。谢谢,

卡梅伦

编辑(添加 OCRimage 功能):

OCRimage 函数如下所示:

def OCRimage(f):
#This runs the magick bash script which splits a multi-image tif into multiple single image tiffs
process = subprocess.Popen(["magick", path + "\\" + f, path + "\\temp\\%d.tif"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
print(process.communicate()[0])

#finds the number of pages for each tiff file (this might not be necassary but the all files in directory python command could access files randomly)
max1 = -1
for filename in os.listdir(path+'\\temp'):
if (max1 < int(filename[0:-4])):
max1 = int(filename[0:-4])
max1 = max1 + 1

text = ""
for each in range(0,max1):
im = Image.open(path + "\\temp\\"+ str(each) + ".tif")
text = text + pytesseract.image_to_string(im)
with open(path + "\\result\\OCR-"+f[0:-4]+".txt", 'w') as file:
file.write(text)

for f in os.listdir(path+'\\temp'):
os.remove(path + '\\temp\\' + f)

Edit2:这是所有导入

import time
import subprocess
import os
import pytesseract
from PIL import Image

from multiprocessing import Pool
import multiprocessing
countcpus = multiprocessing.cpu_count()

编辑3:

仅运行 OCRimage(f) 本身就可以正常工作。我只是使用这个而不是多线程代码:

path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
for p in os.listdir(path):
OCRimage(p)

最佳答案

这是一个 Minimal, Complete, and Verifiable Example这似乎表明问题一定出在您的 OCRimage 函数中(请参阅下面的 Windows 部分了解真正的问题):

from multiprocessing import Pool

def OCRimage(file_name):
print "file_name = %s" % file_name

filterfiles = ["image%03d.tif" % n for n in range(5)]

pool = Pool(processes=2)
result = pool.map(OCRimage, filterfiles)

pool.close()
pool.join()

输出

file_name = image000.tif
file_name = image001.tif
file_name = image002.tif
file_name = image003.tif
file_name = image004.tif

我建议对 OCRimage 的开头进行这些更改:

def OCRimage(file_name):
print "file_name = %s" % file_name
src = os.path.join([path, file_name])
dst = os.path.join([path, 'temp', '%d.tif'])
command_list = ['magick', src, dst]
# This runs the magick bash script which splits a multi-image tif into
# multiple single image tiffs
process = subprocess.Popen(command_list,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
output, errors = process.communicate()
if process.returncode != 0:
print "Image processing failed for %s: %s" % (file_name, errors)
return
# The rest of your code goes here

验证子进程的返回代码是否为零很重要。如果它不为零,那么您确实需要查看 errors 字符串。

Windows

当我运行 mcve在 Windows 上,我遇到了这个异常:

RuntimeError: 
Attempt to start a new process before the current process
has finished its bootstrapping phase.

This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:

if __name__ == '__main__':
freeze_support()
...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main

当我更改 mcve为此,它起作用了:

from multiprocessing import Pool

def OCRimage(file_name):
print "file_name = %s" % file_name

def main():
filterfiles = ["image%03d.tif" % n for n in range(5)]
pool = Pool(processes=2)
result = pool.map(OCRimage, filterfiles)
pool.close()
pool.join()

if __name__ == '__main__':
main()

关于Python Multiprocessing : Pool. map() 似乎根本不调用函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41169146/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com