gpt4 book ai didi

python - Python 3 中的正则表达式问题

转载 作者:太空宇宙 更新时间:2023-11-04 00:36:06 24 4
gpt4 key购买 nike

我有一个相当简单的正则表达式,但出于某种原因它没有捕获所有实例。

我的数据框看起来像这样(包括所有 74 行,因为我不知道问题出在哪里):

Name
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A122_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M
P0824AK03.VAK03_TK02_QE_A100_M

如果我通过了

In [57]: len(df['Name'])

我明白了

Out [57]: 74

我创建了一个正则表达式如下:

p = re.compile('_[A-z][0-9][0-9][0-9]_')

我想创建一个列,其中看起来有点像“_A122_”或“_A100_”等的代码段是值。我想使用正则表达式,因为我稍后想将这段代码应用到一个更大的集合中,在这个集合中,片段并不总是出现在相同的位置。

当我使用以下命令时,结果是我要查找的表单的列表:

In [55]: p.findall(str(df['Name']))
Out[55]:
['_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A100_',
'_A122_',
'_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A122_',
'_A100_',
'_A100_',
'_A100_',
'_A122_']

问题是,这个列表“太短了”。使用 len(p.findall(str(df['Name']))),我得到 60 作为结果。我看不到它缺少哪 14 行!

我不习惯正则表达式,所以这可能是一个非常明显的错误,但我真的很感激任何帮助。

(我想我可以做一个 for 循环并逐个单元地创建新列,但我真的宁愿避免这样做,因为我稍后会将此代码应用于更大的数据集并且不希望它花费一百万年运行)

最佳答案

您可以使用 IIUC .str.extract()为了提取与您的 RegEx 匹配的子字符串:

In [55]: df.Name.str.extract(r'(_[a-zA-Z]\d{3}_)', expand=False)
Out[55]:
0 _A122_
1 _A122_
2 _A122_
3 _A122_
4 _A122_
5 _A122_
6 _A122_
7 _A122_
8 _A122_
9 _A122_
...
64 _A100_
65 _A100_
66 _A100_
67 _A100_
68 _A100_
69 _A100_
70 _A100_
71 _A100_
72 _A100_
73 _A100_
Name: Name, dtype: object

PS 你不应该使用 str(df['Name']) 因为 Pandas DF 的字符串表示会被缩短:

In [58]: pd.options.display.max_rows = 4

In [59]: df
Out[59]:
Name
0 P0824AK03.VAK03_TK02_QE_A122_M
1 P0824AK03.VAK03_TK02_QE_A122_M
.. ...
72 P0824AK03.VAK03_TK02_QE_A100_M
73 P0824AK03.VAK03_TK02_QE_A100_M

[74 rows x 1 columns]

In [60]: str(df['Name'])
Out[60]: '0 P0824AK03.VAK03_TK02_QE_A122_M\n1 P0824AK03.VAK03_TK02_QE_A122_M\n ... \n72
P0824AK03.VAK03_TK02_QE_A100_M\n73 P0824AK03.VAK03_TK02_QE_A100_M\nName: Name, dtype: object'

关于python - Python 3 中的正则表达式问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43978586/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com