gpt4 book ai didi

Python 多处理创建不正确的 pid

转载 作者:太空宇宙 更新时间:2023-11-04 06:35:31 25 4
gpt4 key购买 nike

我已经研究了一段时间了,但似乎无法弄清楚。我将它缩小到我的代码和操作系统(Linux 上的 Python 2.7.3)不同意应该运行哪些进程的情况。发生这种情况时,我的代码永远挂起,但不会抛出任何异常。有时代码会正常运行几个小时,有时只会运行几分钟,我想不通为什么。这表现如下。感谢您的观看,我真的很困惑(双关语)。

代码输出:

创建离散字符矩阵

running PoolWorker_82 (72 triplets), pid 25777, ppid 24892
running PoolWorker_83 (72 triplets), pid 25778, ppid 24892
running PoolWorker_84 (72 triplets), pid 25779, ppid 24892
running PoolWorker_85 (72 triplets), pid 25780, ppid 24892
running PoolWorker_86 (72 triplets), pid 25781, ppid 24892
running PoolWorker_87 (72 triplets), pid 25782, ppid 24892
running PoolWorker_88 (72 triplets), pid 25783, ppid 24892
running PoolWorker_89 (90 triplets), pid 25784, ppid 24892

ps aux 的输出...

1000     24892  2.0  0.9 559948 151088 pts/0   Sl+  09:14   0:16 p runsimulation.py
1000 25776 0.0 0.8 559932 138320 pts/0 S+ 09:19 0:00 p runsimulation.py
1000 26015 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py
1000 26021 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py
1000 26023 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py
1000 26025 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py
1000 26027 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py
1000 26029 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py
1000 26031 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py
1000 26036 0.0 0.8 559948 138140 pts/0 S+ 09:22 0:00 p runsimulation.py

可以看到父进程24982是有的,但是worker的pid没有。通常,这些会匹配,我可以看到工作人员 CPU 使用率在工作时达到 100%,然后在迭代完成后它们都消失了。当它失败时,我得到 pid 不匹配和使用 0.0% CPU 的进程(第 3 列)。

我的代码的相关部分如下(按调用顺序相反):

R 设置使用 rpy2 调用的函数:

def create_R(dir):
"""
creates the r environment
@param dir: the directory for the output files
"""
r = robjects.r
importr("phangorn")
importr("picante")
importr("MASS")
importr("vegan")
r("options(expressions=500000)")
robjects.globalenv['outfile'] = os.path.abspath(os.path.join(dir, "trees.pdf"))
r('pdf(file=outfile, onefile=T)')
r("par(mfrow=c(2,3))")

r("""
generate_triplet = function(bits) {
triplet = replicate(bits, rTraitDisc(tree, model="ER", k=2,states=0:1))
triplet = t(apply(triplet, 1, as.numeric))
sums = rowSums(triplet)
if (length(which(sums==0)) > 0 && length(which(sums==3)) == 1) {
return(triplet)
}
return(generate_triplet(bits))
}
""")

r("""
get_valid_triplets = function(numsamples, needed, bits) {
tryCatch({
m = generate_triplet(bits)
while (ncol(m) < needed) {
m = cbind(m, generate_triplet(bits))
}
return(m)
}, error = function(e){print(message(e))}, warning = function(e){print(message(e))})
}
""")

worker内部调用的函数:

def __get_valid_triplets(num_samples, num_triplets, bits, q):
r = robjects.r
name = current_process().name.replace("-", "_")
timer = stopwatch.Timer()
log("\trunning %s (%d triplets), pid %d, ppid %d" % (name, num_triplets, current_process().pid, os.getppid()),
log_file)
r('%s = get_valid_triplets(%d, %d, %d)' % (name, num_samples, num_triplets, bits))
q.put((name, r[name]))
timer.stop()
log("\t%s complete (%s)" % (name, str(timer)), log_file)

设置池并使用 apply_async 调度工作程序的函数。工作人员写入托管队列,该队列在池加入后进行处理:

def __generate_candidate_discrete_matrix(num_cols, num_samples, sample_tree, bits, usable_cols):
assert isinstance(sample_tree, dendropy.Tree)
print "Creating discrete character matrix"
r = robjects.r
newick = sample_tree.as_newick_string()
num_samples = len(sample_tree.leaf_nodes())
robjects.globalenv['numcols'] = usable_cols
robjects.globalenv['newick'] = newick + ";"
r("tree = read.tree(text=newick)")
r('m = matrix(nrow=length(tree$tip.label))') #create empty matrix
r('m = m[,-1]') #drop the first NA column
num_procs = mp.cpu_count()
args = []
div, mod = divmod(usable_cols, num_procs)
[args.append(div) for i in range(num_procs)]
args[-1] += mod
for i, elem in enumerate(args):
div, mod = divmod(elem, bits)
args[-1] += mod
args[i] -= mod
manager = Manager()
pool = Pool(processes=num_procs, maxtasksperchild=1)
q = manager.Queue(maxsize=num_procs)
for arg in args:
pool.apply_async(__get_valid_triplets, (num_samples, arg, bits, q))
pool.close()
pool.join()

while not q.empty():
name, data = q.get()
robjects.globalenv[name] = data
r('m = cbind(m, %s)' % name)

r('m = m[,1:%d]' % usable_cols)
r('m = m[order(rownames(m)),]') # consistently order the rows
r('m = t(apply(m, 1, as.numeric))') # convert all factors given by rTraitDisc to numeric
a = r['m']
n = r('rownames(m)')
return a, n

最后,第一个生成候选矩阵的函数被调用,确保它是一个有效的,如果不是,它将再次尝试使用一个新的矩阵。如果有效,它会在 R session 中存储一些东西并返回数据

def create_discrete_matrix(num_cols, num_samples, sample_tree, bits):
"""
Creates a discrete char matrix from a tree
@param num_cols: number of columns to create
@param sample_tree: the tree
@return: a r object of the matrix, and a list of the row names
@rtype: tuple(robjects.Matrix, list)
"""
r = robjects.r
usable_cols = find_usable_length(num_cols, bits)
a, n = __generate_candidate_discrete_matrix(num_cols, num_samples, sample_tree, bits, usable_cols)
assert isinstance(a, robjects.Matrix)
assert a.ncol == usable_cols

paralin_matrix, valid = __create_paralin_matrix(a)
if valid is False:
sample_tree = create_tree(num_samples, type = "S")
return create_discrete_matrix(num_cols, num_samples, sample_tree, bits)
else:
robjects.globalenv['paralin_matrix'] = paralin_matrix
r('rownames(paralin_matrix) = rownames(m)')
r('paralin_dist = as.dist(paralin_matrix, diag=T, upper=T)')
r("paralinear_cluster = hclust(paralin_dist, method='average')")
return sample_tree, a, n

最佳答案

看来这是通过服务器重启 (FML) 解决的。但是,获得了有效信息。将 worker 提交到池中时,请确保在 worker 本身中捕获异常,而不是在调用 pool.apply_async 的方法中捕获异常。

def __get_valid_triplets(num_samples, num_triplets, bits, q):
try:
r = robjects.r
name = current_process().name.replace("-", "_")
timer = stopwatch.Timer()
log("\trunning %s (%d triplets), pid %d, ppid %d" % (name, num_triplets, current_process().pid, os.getppid()),
log_file)
r('%s = get_valid_triplets(%d, %d, %d)' % (name, num_samples, num_triplets, bits))
q.put((name, r[name]))
timer.stop()
log("\t%s complete (%s)" % (name, str(timer)), log_file)
except Exception, e:
q.put("DEATH")
traceback.print_exc()

关于Python 多处理创建不正确的 pid,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11884864/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com