gpt4 book ai didi

python - 在 numba 中使用 numpy.vstack

转载 作者:太空狗 更新时间:2023-10-30 00:37:45 26 4
gpt4 key购买 nike

所以我一直在尝试优化一些代码,这些代码根据一些数组数据计算统计误差指标。该指标称为连续排名概率分数 (CRPS)。

我一直在使用 Numba 来尝试加快此计算所需的双 for 循环,但我一直遇到 numpy.vstack 函数的问题。据我对文档的了解 here ,应该支持 vstack() 函数,但是当我运行以下代码时出现错误。

def crps_hersbach_numba(obs, fcst_ens, remove_neg=False, remove_zero=False):
"""Calculate the the continuous ranked probability score (CRPS) as per equation 25-27 in
Hersbach et al. (2000)

Parameters
----------
obs: 1D ndarry
Array of observations for each start date
fcst_ens: 2D ndarray
Array of ensemble forecast of dimension n x M, where n = number of start dates and
M = number of ensemble members.

remove_neg: bool
If True, when a negative value is found at the i-th position in the observed OR ensemble
array, the i-th value of the observed AND ensemble array are removed before the
computation.

remove_zero: bool
If true, when a zero value is found at the i-th position in the observed OR ensemble
array, the i-th value of the observed AND ensemble array are removed before the
computation.

Returns
-------
dict
Dictionary contains a number of *experimental* outputs including:
- ["crps"] 1D ndarray of crps values per n start dates.
- ["crpsMean1"] arithmetic mean of crps values.
- ["crpsMean2"] mean crps using eqn. 28 in Hersbach (2000).

Notes
-----
**NaN and inf treatment:** If any value in obs or fcst_ens is NaN or inf, then the
corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted.

References
----------
- Hersbach, H. (2000) Decomposition of the Continuous Ranked Porbability Score
for Ensemble Prediction Systems, Weather and Forecasting, 15, 559-570.
"""
# Treating the Data
obs, fcst_ens = treat_data(obs, fcst_ens, remove_neg=remove_neg, remove_zero=remove_zero)

# Set parameters
n = fcst_ens.shape[0] # number of forecast start dates
m = fcst_ens.shape[1] # number of ensemble members

# Create vector of pi's
p = np.linspace(0, m, m + 1)
pi = p / m

crps_numba = np.zeros(n)

@njit
def calculate_crps():
# Loop fcst start times
for i in prange(n):

# Initialise vectors for storing output
a = np.zeros(m - 1)
b = np.zeros(m - 1)

# Verifying analysis (or obs)
xa = obs[i]

# Ensemble fcst CDF
x = np.sort(fcst_ens[i, :])

# Deal with 0 < i < m [So, will loop 50 times for m = 51]
for j in prange(m - 1):

# Rule 1
if xa > x[j + 1]:
a[j] = x[j + 1] - x[j]
b[j] = 0

# Rule 2
if x[j] < xa < x[j + 1]:
a[j] = xa - x[j]
b[j] = x[j + 1] - xa

# Rule 3
if xa < x[j]:
a[j] = 0
b[j] = x[j + 1] - x[j]

# Deal with outliers for i = 0, and i = m,
# else a & b are 0 for non-outliers
if xa < x[0]:
a1 = 0
b1 = x[0] - xa
else:
a1 = 0
b1 = 0

# Upper outlier (rem m-1 is for last member m, but python is 0-based indexing)
if xa > x[m - 1]:
am = xa - x[m - 1]
bm = 0
else:
am = 0
bm = 0

# Combine full a & b vectors including outlier
a = np.concatenate((np.array([0]), a, np.array([am])))
# a = np.insert(a, 0, a1)
# a = np.append(a, am)
a = np.concatenate((np.array([0]), a, np.array([bm])))
# b = np.insert(b, 0, b1)
# b = np.append(b, bm)

# Populate a_mat and b_mat
if i == 0:
a_mat = a
b_mat = b
else:
a_mat = np.vstack((a_mat, a))
b_mat = np.vstack((b_mat, b))

# Calc crps for individual start times
crps_numba[i] = ((a * pi ** 2) + (b * (1 - pi) ** 2)).sum()

return crps_numba, a_mat, b_mat

crps, a_mat, b_mat = calculate_crps()
print(crps)
# Calc mean crps as simple mean across crps[i]
crps_mean_method1 = np.mean(crps)

# Calc mean crps across all start times from eqn. 28 in Hersbach (2000)
abar = np.mean(a_mat, 0)
bbar = np.mean(b_mat, 0)
crps_mean_method2 = ((abar * pi ** 2) + (bbar * (1 - pi) ** 2)).sum()

# Output array as a dictionary
output = {'crps': crps, 'crpsMean1': crps_mean_method1,
'crpsMean2': crps_mean_method2}

return output

我得到的错误是这样的:

Cannot unify array(float64, 1d, C) and array(float64, 2d, C) for 'a_mat', defined at *path

File "test.py", line 86:
def calculate_crps():
<source elided>
if i == 0:
a_mat = a
^

[1] During: typing of assignment at *path

File "test.py", line 89:
def calculate_crps():
<source elided>
else:
a_mat = np.vstack((a_mat, a))
^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

我只是想知道我哪里出错了。似乎 vstack 函数应该可以工作,但也许我遗漏了一些东西。

最佳答案

I just wanted to know where I am going wrong. It seems as though the vstack function should work but maybe I am missing something.

TL;DR:问题不在于 vstack。问题是您的代码路径试图将不同类型的数组分配给同一个变量(这会引发统一异常)。

问题出在这里:

# Populate a_mat and b_mat
if i == 0:
a_mat = a
b_mat = b
else:
a_mat = np.vstack((a_mat, a))
b_mat = np.vstack((b_mat, b))

在第一个代码路径中,您将 1d c-contigous float64 数组分配给 a_matb_mat,在 else 中,它是一个 2d c-连续的 float64 数组。这些类型不兼容,因此 numba 会抛出错误。有时 numba 代码不像 Python 代码那样工作是很棘手的,在这种情况下,当您将某些内容分配给变量时,您拥有什么类型并不重要。然而,在最近的版本中,numba 异常消息变得更好,所以如果您知道异常提示的内容,您通常可以快速找出问题所在。

更长的解释

问题在于 numba 会隐式推断变量的类型。例如:

from numba import njit

@njit
def func(arr):
a = arr
return a

这里我没有输入函数所以我需要运行一次:

>>> import numpy as np
>>> func(np.zeros(5))
array([0., 0., 0., 0., 0.])

然后你可以检查类型:

>>> func.inspect_types()
func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-4-02470248b065>
# --- LINE 3 ---
# label 0

@njit

# --- LINE 4 ---

def func(arr):

# --- LINE 5 ---
# arr = arg(0, name=arr) :: array(float64, 1d, C)
# a = arr :: array(float64, 1d, C)
# del arr

a = arr

# --- LINE 6 ---
# $0.3 = cast(value=a) :: array(float64, 1d, C)
# del a
# return $0.3

return a

如您所见,变量 a 的输入类型为 array(float64, 1d, C)array(float64, 1d, C )

现在,让我们使用 np.vstack 代替:

from numba import njit
import numpy as np

@njit
def func(arr):
a = np.vstack((arr, arr))
return a

以及编译它的强制性第一次调用:

>>> func(np.zeros(5))
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])

然后再次检查类型:

func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-11-f0214d5181c6>
# --- LINE 4 ---
# label 0

@njit

# --- LINE 5 ---

def func(arr):

# --- LINE 6 ---
# arr = arg(0, name=arr) :: array(float64, 1d, C)
# $0.1 = global(np: <module 'numpy'>) :: Module(<module 'numpy'>)
# $0.2 = getattr(value=$0.1, attr=vstack) :: Function(<function vstack at 0x000001DB7082A400>)
# del $0.1
# $0.5 = build_tuple(items=[Var(arr, <ipython-input-11-f0214d5181c6> (6)), Var(arr, <ipython-input-11-f0214d5181c6> (6))]) :: tuple(array(float64, 1d, C) x 2)
# del arr
# $0.6 = call $0.2($0.5, func=$0.2, args=[Var($0.5, <ipython-input-11-f0214d5181c6> (6))], kws=(), vararg=None) :: (tuple(array(float64, 1d, C) x 2),) -> array(float64, 2d, C)
# del $0.5
# del $0.2
# a = $0.6 :: array(float64, 2d, C)
# del $0.6

a = np.vstack((arr, arr))

# --- LINE 7 ---
# $0.8 = cast(value=a) :: array(float64, 2d, C)
# del a
# return $0.8

return a

这次 a 被输入为 array(float64, 2d, C) 作为 array(float64, 1d, C) 的输入.

您可能问过自己我为什么要谈论这个。让我们看看如果您尝试有条件地分配给 a 会发生什么:

from numba import njit
import numpy as np

@njit
def func(arr, condition):
if condition:
a = np.vstack((arr, arr))
else:
a = arr
return a

如果您现在运行代码:

>>> func(np.zeros(5), True)
TypingError: Failed at nopython (nopython frontend)
Cannot unify array(float64, 2d, C) and array(float64, 1d, C) for 'a', defined at <ipython-input-16-f4bd9a4f377a> (7)

File "<ipython-input-16-f4bd9a4f377a>", line 7:
def func(arr, condition):
<source elided>
if condition:
a = np.vstack((arr, arr))
^

[1] During: typing of assignment at <ipython-input-16-f4bd9a4f377a> (9)

File "<ipython-input-16-f4bd9a4f377a>", line 9:
def func(arr, condition):
<source elided>
else:
a = arr
^

这正是您遇到的问题,这是因为对于一组固定的输入类型,变量在 numba 中需要且只有一种类型。因为数据类型、等级(维数)和连续属性都是类型的一部分,所以您不能将具有不同维数的数组分配给同一个变量。

请注意,您可以扩展维度以使其工作并再次从结果中压缩不必要的维度(可能不是很好,但它应该以最少的“更改”来解决问题:

from numba import njit
import numpy as np

@njit
def func(arr, condition):
if condition:
a = np.vstack((arr, arr))
else:
a = np.expand_dims(arr, 0)
return a

>>> func(np.zeros(5), False)
array([[0., 0., 0., 0., 0.]]) # <-- 2d array!
>>> func(np.zeros(5), True)
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])

关于python - 在 numba 中使用 numpy.vstack,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51754268/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com