gpt4 book ai didi

python - 我可以将正则表达式 re.sub() 与 numpy 数组或字符串列表一起使用吗?

转载 作者:太空宇宙 更新时间:2023-11-04 01:00:25 25 4
gpt4 key购买 nike

我有一个 numpy 条目数组 dtype=string_。我想使用正则表达式 re 模块来替换所有多余的空格、\t 制表符、\n 制表符。

如果我使用单个字符串,我会使用 re.sub() 如下:

import re

proust = 'If a little dreaming is dangerous, \t the cure for it is not to dream less but to dream more,. \t\t'

newstring = re.sub(r"\s+", " ", proust)

返回

'If a little dreaming is dangerous, the cure for it is not to dream less but to dream more. '

要在 numpy 数组的每个条目中执行此操作,我应该以某种方式使用 for 循环。

类似于 for i in numpy_arr:,但我不确定在将 re.sub() 应用于每个 numpy 数组元素时应该遵循此 soc。

解决这个问题最明智的方法是什么?


编辑:

我原来的 numpy 数组或列表是一个很长的条目列表/数组,每个条目一个句子,如上。下面是五个条目的示例:

original_list = [ 'to be or     \n\n not to be     that is the question', 
' to be or not to be that is the question\t ',
'to be or not to be that is the question',
'to be or not to be that is the question\t ',
'to be or not to be that is \t the question']

最佳答案

这不完全是您的 re.sub,但效果是一样的,甚至更好:

In [109]: oarray
Out[109]:
array(['to be or \n\n not to be that is the question',
' to be or not to be that is the question\t ',
'to be or not to be that is the question',
'to be or not to be that is the question\t ',
'to be or not to be that is \t the question'],
dtype='<U55')
In [110]: np.char.join(' ',np.char.split(oarray))Out[110]:
array(['to be or not to be that is the question',
'to be or not to be that is the question',
'to be or not to be that is the question',
'to be or not to be that is the question',
'to be or not to be that is the question'],
dtype='<U39')

它在这种情况下有效,因为 split() 识别与“\s+”相同的空白字符集。

np.char.replace 将替换选定的字符,但必须应用多次才能删除 '\n',然后删除 '\t' 等。还有一个 翻译

关于python - 我可以将正则表达式 re.sub() 与 numpy 数组或字符串列表一起使用吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33093333/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com