gpt4 book ai didi

python - Numba - 如何并行填充二维数组

转载 作者:太空宇宙 更新时间:2023-11-04 02:02:47 25 4
gpt4 key购买 nike

我有一个函数可以在 float64(x,y) 上的二维矩阵上运行。基本概念:对于每个行组合(编号行选择 2)计算减法后正值的数量(行 1 - 行 2)。在 int64(y,y) 的二维矩阵中,如果值高于特定阈值,则将此值存储在索引 [row1,row2] 中,如果低于特定阈值,则存储在索引 [row2,row1] 中。

我已经实现了它并用 @njit(parallel=False) 对其进行了修饰,效果很好 @njit(parallel=True) 似乎没有加速。为了加快整个过程,我查看了@guvectorize,效果也不错。但是,在这种情况下,我也无法弄清楚如何将 @guvectorize 与 parallel true 一起使用。

我看过numba guvectorize target='parallel' slower than target='cpu' ,解决方案是改用@vecorize,但我无法将解决方案转移到我的问题上,因此我现在正在寻求帮助:)

基本的 jitted 和 guvectorized 实现

import numpy as np
from numba import jit, guvectorize, prange
import timeit

@jit(parallel=False)
def check_pairs_sg(raw_data):
# 2D array to be filled
result = np.full((len(raw_data), len(raw_data)), -1)

# Iterate over all possible gene combinations
for r1 in range(0, len(raw_data)):
for r2 in range(r1+1, len(raw_data)):
diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

num_pos = len(np.where(diff > 0)[0])

# Arbitrary check to illustrate
if num_pos >= 5:
result[r1,r2] = num_pos
else:
result[r2,r1] = num_pos

return result

@jit(parallel=True)
def check_pairs_multi(raw_data):
# 2D array to be filled
result = np.full((len(raw_data), len(raw_data)), -1)

# Iterate over all possible gene combinations
for r1 in range(0, len(raw_data)):
for r2 in prange(r1+1, len(raw_data)):
diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

num_pos = len(np.where(diff > 0)[0])

# Arbitrary check to illustrate
if num_pos >= 5:
result[r1,r2] = num_pos
else:
result[r2,r1] = num_pos

return result

@guvectorize(["void(float64[:,:], int64[:,:])"],
"(n,m)->(m,m)", target='cpu')
def check_pairs_guvec_sg(raw_data, result):
for r1 in range(0, len(result)):
for r2 in range(r1+1, len(result)):
diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

num_pos = len(np.where(diff > 0)[0])

# Arbitrary check to illustrate
if num_pos >= 5:
result[r1,r2] = num_pos
else:
result[r2,r1] = num_pos

@guvectorize(["void(float64[:,:], int64[:,:])"],
"(n,m)->(m,m)", target='parallel')
def check_pairs_guvec_multi(raw_data, result):
for r1 in range(0, len(result)):
for r2 in range(r1+1, len(result)):
diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

num_pos = len(np.where(diff > 0)[0])

# Arbitrary check to illustrate
if num_pos >= 5:
result[r1,r2] = num_pos
else:
result[r2,r1] = num_pos

if __name__=="__main__":
np.random.seed(404)
a = np.random.random((512,512)).astype(np.float64)
res = np.full((len(a), len(a)), -1)

测量
%timeit check_pairs_sg(a)
%timeit check_pairs_multi(a)
%timeit check_pairs_guvec_sg(a, res)
%timeit check_pairs_guvec_multi(a, res)

导致:

614 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
507 ms ± 6.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
622 ms ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
671 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我全神贯注于如何将其实现为 @vectorized 或适当的并行 @guvectorize 以真正并行地填充生成的二维数组。

我想这是我尝试将其进一步应用于 gpu 之前的第一步。

非常感谢任何帮助。

最佳答案

编写 Numba 代码时考虑其他编译语言

例如,想想这些行的或多或少完全等效的实现

diff = np.subtract(raw_data[:, r1], raw_data[:, r2])
num_pos = len(np.where(diff > 0)[0])

在 C++ 中。

伪代码

  • 分配一个数组diff,循环遍历raw_data[i*size_dim_1+r1](循环索引为i)
  • 分配一个 bool 数组,遍历整个数组 diff 并检查是否 diff[i]>0
  • 遍历 bool 数组,获取 b_arr==True 的索引,并通过 vector::push_back() 将它们保存到向量中。
  • 检查向量的大小

您代码中的主要问题是:

  • 为简单操作创建临时数组
  • 非连续内存访问

优化代码

删除临时数组和简化

@nb.njit(parallel=False)
def check_pairs_simp(raw_data):
# 2D array to be filled
result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)

# Iterate over all possible gene combinations
for r1 in range(0, raw_data.shape[1]):
for r2 in range(r1+1, raw_data.shape[1]):
num_pos=0
for i in range(raw_data.shape[0]):
if (raw_data[i,r1]>raw_data[i,r2]):
num_pos+=1

# Arbitrary check to illustrate
if num_pos >= 5:
result[r1,r2] = num_pos
else:
result[r2,r1] = num_pos

return result

移除临时数组和简化 + 连续内存访问

@nb.njit(parallel=False)
def check_pairs_simp_rev(raw_data_in):
#Create a transposed array not just a view
raw_data=np.ascontiguousarray(raw_data_in.T)

# 2D array to be filled
result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)

# Iterate over all possible gene combinations
for r1 in range(0, raw_data.shape[0]):
for r2 in range(r1+1, raw_data.shape[0]):
num_pos=0
for i in range(raw_data.shape[1]):
if (raw_data[r1,i]>raw_data[r2,i]):
num_pos+=1

# Arbitrary check to illustrate
if num_pos >= 5:
result[r1,r2] = num_pos
else:
result[r2,r1] = num_pos

return result

移除临时数组和简化 + 连续内存访问 + 并行化

@nb.njit(parallel=True,fastmath=True)
def check_pairs_simp_rev_p(raw_data_in):
#Create a transposed array not just a view
raw_data=np.ascontiguousarray(raw_data_in.T)

# 2D array to be filled
result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)

# Iterate over all possible gene combinations
for r1 in nb.prange(0, raw_data.shape[0]):
for r2 in range(r1+1, raw_data.shape[0]):
num_pos=0
for i in range(raw_data.shape[1]):
if (raw_data[r1,i]>raw_data[r2,i]):
num_pos+=1

# Arbitrary check to illustrate
if num_pos >= 5:
result[r1,r2] = num_pos
else:
result[r2,r1] = num_pos

return result

时间

%timeit check_pairs_sg(a)
488 ms ± 8.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit check_pairs_simp(a)
186 ms ± 3.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit check_pairs_simp_rev(a)
12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit check_pairs_simp_rev_p(a)
5.43 ms ± 49.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - Numba - 如何并行填充二维数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55398477/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com