gpt4 book ai didi

python - 基于字典替换 NumPy 数组中的值并避免新值和键之间的重叠

转载 作者:太空宇宙 更新时间:2023-11-03 13:09:28 25 4
gpt4 key购买 nike

我想根据 python 中的以下字典替换 2D numpy 数组中的值:

code    region
334 0
4 22
8 31
12 16
16 17
24 27
28 18
32 21
36 1

我想在 numpy 二维数组中找到与 code 匹配的单元格,并替换为 region 列中的相应值。问题在于,这将导致 code = 12 替换为 region = 16 并且在下一行中,所有值为 16 的单元格(包括刚刚分配的单元格)值 16) 将替换为值 17。我该如何防止这种情况发生?

最佳答案

这是一个基于 np.searchsorted 的向量化追溯数组中每个键的位置,然后替换并请原谅这里几乎性别歧视的函数名称(虽然没办法)-

def replace_with_dict(ar, dic):
# Extract out keys and values
k = np.array(list(dic.keys()))
v = np.array(list(dic.values()))

# Get argsort indices
sidx = k.argsort()

# Drop the magic bomb with searchsorted to get the corresponding
# places for a in keys (using sorter since a is not necessarily sorted).
# Then trace it back to original order with indexing into sidx
# Finally index into values for desired output.
return v[sidx[np.searchsorted(k,ar,sorter=sidx)]]

sample 运行-

In [82]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
...:
...: np.random.seed(0)
...: a = np.random.choice(dic.keys(), 20)
...:

In [83]: a
Out[83]:
array([ 28, 16, 32, 32, 334, 32, 28, 4, 8, 334, 12, 36, 36,
24, 12, 334, 334, 36, 24, 28])

In [84]: replace_with_dict(a, dic)
Out[84]:
array([18, 17, 21, 21, 0, 21, 18, 22, 31, 0, 16, 1, 1, 27, 16, 0, 0,
1, 27, 18])

改进

对于大数组,更快的方法是对值和键数组进行排序,然后使用不带 sortersearchsorted,就像这样 -

def replace_with_dict2(ar, dic):
# Extract out keys and values
k = np.array(list(dic.keys()))
v = np.array(list(dic.values()))

# Get argsort indices
sidx = k.argsort()

ks = k[sidx]
vs = v[sidx]
return vs[np.searchsorted(ks,ar)]

运行时测试-

In [91]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
...:
...: np.random.seed(0)
...: a = np.random.choice(dic.keys(), 20000)

In [92]: out1 = replace_with_dict(a, dic)
...: out2 = replace_with_dict2(a, dic)
...: print np.allclose(out1, out2)
True

In [93]: %timeit replace_with_dict(a, dic)
1000 loops, best of 3: 453 µs per loop

In [95]: %timeit replace_with_dict2(a, dic)
1000 loops, best of 3: 341 µs per loop

所有数组元素都不在字典中的一般情况

如果不能保证输入数组中的所有元素都在字典中,我们需要做更多的工作,如下所列 -

def replace_with_dict2_generic(ar, dic, assume_all_present=True):
# Extract out keys and values
k = np.array(list(dic.keys()))
v = np.array(list(dic.values()))

# Get argsort indices
sidx = k.argsort()

ks = k[sidx]
vs = v[sidx]
idx = np.searchsorted(ks,ar)

if assume_all_present==0:
idx[idx==len(vs)] = 0
mask = ks[idx] == ar
return np.where(mask, vs[idx], ar)
else:
return vs[idx]

sample 运行-

In [163]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
...:
...: np.random.seed(0)
...: a = np.random.choice(dic.keys(), (20))
...: a[-1] = 400

In [165]: a
Out[165]:
array([ 28, 16, 32, 32, 334, 32, 28, 4, 8, 334, 12, 36, 36,
24, 12, 334, 334, 36, 24, 400])

In [166]: replace_with_dict2_generic(a, dic, assume_all_present=False)
Out[166]:
array([ 18, 17, 21, 21, 0, 21, 18, 22, 31, 0, 16, 1, 1,
27, 16, 0, 0, 1, 27, 400])

关于python - 基于字典替换 NumPy 数组中的值并避免新值和键之间的重叠,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47171356/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com