gpt4 book ai didi

python - pandas:如何为列中的每个子字符串复制一个值

转载 作者:行者123 更新时间:2023-12-05 03:17:39 25 4
gpt4 key购买 nike

我有一个 pandas 数据框,如下所示,

import pandas as pd

df = pd.DataFrame({'text': ['set an alarm for [time : two hours from now]','wake me up at [time : nine am] on [date : friday]','check email from [person : john]']})
print(df)

原始数据框

                                                text
0 set an alarm for [time : two hours from now]
1 wake me up at [time : nine am] on [date : friday]
2 check email from [person : john]

如果列表中的值不止一个,我想为列表中的所有值重复列表和标签(日期、时间和人物)。所以所需的输出是,

期望的输出:

                                                new_text                                
0 set an alarm for [time : two] [time : hours] [time : from] [time : now]
1 wake me up at [time : nine] [time : am] on [date : friday]
2 check email from [person : john]

到目前为止,我已尝试将列表与原始列分开,但不知道如何继续。

df['separated_list'] = df.text.str.split(r"\s(?![^[]*])|[|]").apply(lambda x: [y for y in x if '[' in y])

最佳答案

您可以使用带有自定义函数的正则表达式作为替换:

df['new_text'] = df.text.str.replace(
r"\[([^\[\]]*?)\s*:\s*([^\[\]]*)\]",
lambda m: ' '.join([f'[{m.group(1)} : {x}]'
for x in m.group(2).split()]), # new chunk for each word
regex=True)

输出:

                                                text                                                                 new_text
0 set an alarm for [time : two hours from now] set an alarm for [time : two] [time : hours] [time : from] [time : now]
1 wake me up at [time : nine am] on [date : friday] wake me up at [time : nine] [time : am] on [date : friday]
2 check email from [person : john] check email from [person : john]

regex demo

关于python - pandas:如何为列中的每个子字符串复制一个值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74042649/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com