gpt4 book ai didi

python - 在 Pandas 中拆分列

转载 作者:行者123 更新时间:2023-12-01 22:57:39 24 4
gpt4 key购买 nike

所以我有一个看起来像这样的 pandas 专栏:

full_name = pd.Series([
'Reservoir 1 Compartment 1',
'Reservoir 1 Common Inlet',
'Reservoir 2 Compartment 1',
'Vyrnwy Line 2 Balancing Tank 1',
'Reservoir 1'
])

我想把它分成两列。预期的输出应如下所示:

[['Reservoir 1', 'Compartment 1'],
['Reservoir 1', 'Common Inlet'],
['Reservoir 2', 'Compartment 1'],
['Vyrnwy Line 2', 'Balancing Tank 1'],
['Reservoir 1', None]]

我试过这个:

res_compartment_split = pd.concat([full_name.str.split(r'\s\s*?(?=[A-Z])', expand=True)])

但我得到了这个输出

[['Reservoir 1', 'Compartment 1', None, None],
['Reservoir 1', 'Common', 'Inlet', None],
['Reservoir 2', 'Compartment 1', None, None],
['Vyrnwy', 'Line 2', 'Balancing', 'Tank 1'],
['Reservoir 1', None, None, None]]

感谢您的帮助。

最佳答案

尝试以下操作:

import pandas as pd

full_name = pd.Series([
'Reservoir 1 Compartment 1',
'Reservoir 1 Common Inlet',
'Reservoir 2 Compartment 1',
'Vyrnwy Line 2 Balancing Tank 1',
'Reservoir 1'
])

res = full_name.str.split('(?<=\d)\s+(?=[A-Z])', expand=True)

输出:

>>> res
0 1
0 Reservoir 1 Compartment 1
1 Reservoir 1 Common Inlet
2 Reservoir 2 Compartment 1
3 Vyrnwy Line 2 Balancing Tank 1
4 Reservoir 1 None

正则表达式模式的解释:

  • (?<=\d) - 积极的后视:确保在分隔符之前有一个数字,而不消耗它
  • \s+ - 分隔符:匹配一个或多个空格
  • (?=[A-Z]) - 正向前瞻:确保紧接着有一个字母(A 到 Z),而不消耗它

使用 regex101.com 查看实际效果.

另外,您可以在这里看到为什么您的模式不起作用:https://regex101.com/r/nSmEEs/1 .

关于python - 在 Pandas 中拆分列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72670379/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com