gpt4 book ai didi

python - Numpy 字符串分区 : Perform Multiple Splits

转载 作者:太空宇宙 更新时间:2023-11-04 09:27:01 25 4
gpt4 key购买 nike

我有一组字符串,每个字符串包含一个或多个单词。我想在分隔符(在我的例子中为空白)上拆分/分区数组,拆分次数与包含最多分隔符的元素中的分隔符一样多。 numpy.char.partition但是只执行一次拆分,不管分隔符出现的频率如何:

我有:

>>> a = np.array(['word', 'two words', 'and three words'])
>>> np.char.partition(a, ' ')

>>> array([['word', '', ''],
['two', ' ', 'words'],
['and', ' ', 'three words']], dtype='<U8')

我想要:

>>> array([['word', '', '', '', ''],
['two', ' ', 'words', '', ''],
['and', ' ', 'three', ' ', 'words']], dtype='<U8')

最佳答案

方法 #1

那些 partition 函数似乎并没有对所有事件进行分区。为了解决我们的问题,我们可以使用 np.char.split 来获取拆分字符串,然后使用 maskingarray-assignment,就像这样-

def partitions(a, sep):
# Split based on sep
s = np.char.split(a,sep)

# Get concatenated split strings
cs = np.concatenate(s)

# Get params
N = len(a)
l = np.array(list(map(len,s)))
el = 2*l-1
ncols = el.max()

out = np.zeros((N,ncols),dtype=cs.dtype)

# Setup valid mask that starts at fist col until the end for each row
mask = el[:,None] > np.arange(el.max())

# Assign sepeter into valid ones
out[mask] = sep

# Setup valid mask that has True at postions where words are to be assigned
mask[:,1::2] = 0

# Assign words
out[mask] = cs
return out

样本运行-

In [32]: a = np.array(['word', 'two words', 'and three words'])

In [33]: partitions(a, sep=' ')
Out[33]:
array([['word', '', '', '', ''],
['two', ' ', 'words', '', ''],
['and', ' ', 'three', ' ', 'words']], dtype='<U5')

In [44]: partitions(a, sep='ord')
Out[44]:
array([['w', 'ord', ''],
['two w', 'ord', 's'],
['and three w', 'ord', 's']], dtype='<U11')

方法 #2

这是另一个带循环的,以节省内存 -

def partitions_loopy(a, sep):
# Get params
N = len(a)
l = np.char.count(a, sep)+1
ncols = 2*l.max()-1
out = np.zeros((N,ncols),dtype=a.dtype)
for i,(a_i,L) in enumerate(zip(a,l)):
ss = a_i.split(sep)
out[i,1:2*L-1:2] = sep
out[i,:2*L:2] = ss
return out

关于python - Numpy 字符串分区 : Perform Multiple Splits,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57159366/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com