gpt4 book ai didi

python-3.x - 计算 numpy.chararray 中字符出现次数的最快方法

转载 作者:行者123 更新时间:2023-12-04 22:25:12 26 4
gpt4 key购买 nike

python 爱好者,

计算 numpy.character 中字符出现的最快方法是什么?大批。

我正在做以下事情:

In [59]: for i in range(10):
...: m = input("Enter A or B: ")
...: rr[0][i] = m
...:
Enter A or B: B
Enter A or B: B
Enter A or B: B
Enter A or B: A
Enter A or B: B
Enter A or B: A
Enter A or B: A
Enter A or B: A
Enter A or B: B
Enter A or B: A

In [60]: rr
Out[60]:
chararray([['B', 'B', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'A']],
dtype='<U1')

In [61]: %timeit a = rr.count('A')
12.5 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [62]: %timeit d = len(a[a.nonzero()])
3.03 µs ± 54.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

我相信必须有更好的方法来快速和优雅地实现这一目标。

最佳答案

It's better to stick to regular NumPy arrays over the chararrays :

Note:

The chararray class exists for backwards compatibility with Numarray, it is not recommended for new development. Starting from numpy 1.4, if one needs arrays of strings, it is recommended to use arrays of dtype object_, string_ or unicode_, and use the free functions in the numpy.char module for fast vectorized string operations.



使用常规数组,让我们提出两种方法。

方法#1

我们可以使用 np.count_nonzero 来数 True与搜索元素比较后的: 'A' ——
np.count_nonzero(rr=='A')

方法#2

chararray仅保存单个字符元素,我们可以通过使用 uint8 查看它来优化很多。 dtype 然后比较和计数。计数会更快,因为我们将处理数字数据。实现将是 -
np.count_nonzero(rr.view(np.uint8)==ord('A'))

Python 2.x , 这将是 -
np.count_nonzero(np.array(rr.view(np.uint8))==ord('A'))

计时

原始样本数据的计时并缩放至 10,000x缩放的 -
# Original sample data
In [10]: rr
Out[10]: array(['B', 'B', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'A'], dtype='<U1')

# @Nils Werner's soln
In [14]: %timeit np.sum(rr == 'A')
100000 loops, best of 3: 3.86 µs per loop

# Approach #1 from this post
In [13]: %timeit np.count_nonzero(rr=='A')
1000000 loops, best of 3: 1.04 µs per loop

# Approach #2 from this post
In [40]: %timeit np.count_nonzero(rr.view(np.uint8)==ord('A'))
1000000 loops, best of 3: 1.86 µs per loop

# Original sample data scaled by 10,000x
In [16]: rr = np.repeat(rr,10000)

# @Nils Werner's soln
In [18]: %timeit np.sum(rr == 'A')
1000 loops, best of 3: 734 µs per loop

# Approach #1 from this post
In [17]: %timeit np.count_nonzero(rr=='A')
1000 loops, best of 3: 659 µs per loop

# Approach #2 from this post
In [24]: %timeit np.count_nonzero(rr.view(np.uint8)==ord('A'))
10000 loops, best of 3: 40.2 µs per loop

关于python-3.x - 计算 numpy.chararray 中字符出现次数的最快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52145157/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com