gpt4 book ai didi

python - 连接 Pandas 数据框中较早行的值

转载 作者:太空宇宙 更新时间:2023-11-04 11:12:00 25 4
gpt4 key购买 nike

我有一个有点奇怪的 Pandas 组问题。

我有一个源数据框,它包含三列:客户、日期和项目。我想添加一个包含项目历史记录的新列,该列是该客户在较早(由日期定义)行中的所有项目的数组。例如给定这个源数据框:

Customer    Date    Item
Bert 01/01/2019 Bread
Bert 15/01/2019 Cheese
Bert 20/01/2019 Apples
Bert 22/01/2019 Pears
Ernie 01/01/2019 Buzz Lightyear
Ernie 15/01/2019 Shellfish
Ernie 20/01/2019 A pet dog
Ernie 22/01/2019 Yoghurt
Steven 01/01/2019 A golden toilet
Steven 15/01/2019 Dominoes

我想创建这个历史功能:

Customer    Date    Item    Item History
Bert 01/01/2019 Bread NaN
Bert 15/01/2019 Cheese [Bread]
Bert 20/01/2019 Apples [Bread, Cheese]
Bert 22/01/2019 Pears [Bread, Cheese, Apples]
Ernie 01/01/2019 Buzz Lightyear NaN
Ernie 15/01/2019 Shellfish [Buzz Lightyear]
Ernie 20/01/2019 A pet dog [Buzz Lightyear, Shellfish]
Ernie 22/01/2019 Yoghurt [Buzz Lightyear, Shellfish, A pet dog]
Steven 01/01/2019 A golden toilet NaN
Steven 15/01/2019 Dominoes [A golden toilet]

我可以执行以下操作以按日期获取历史记录:

df.groupby(['Customer', 'Date']).agg(lambda x: tuple(x)).applymap(list).reset_index()

因此,如果客户在一天内购买了多件商品,它们都列在一个数组中,而客户只购买了一件单独在其自己的数组中的商品,但我不知道如何将它们连接起来与前面的行。

最佳答案

将自定义 lambda 函数与 GroupBy.transform 结合使用, 最后将空列表替换为 NaNs:

f = lambda x: [x[:i].tolist() for i in range(len(x))]
df['Item History'] = df.groupby('Customer')['Item'].transform(f)

另一个列表理解的解决方案:

df['Item History'] = [x.Item[:i].tolist() for j, x in df.groupby('Customer') 
for i in range(len(x))]

df.loc[~df['Item History'].astype(bool), 'Item History']= np.nan

print (df)
Customer Date Item \
0 Bert 01/01/2019 Bread
1 Bert 15/01/2019 Cheese
2 Bert 20/01/2019 Apples
3 Bert 22/01/2019 Pears
4 Ernie 01/01/2019 Buzz Lightyear
5 Ernie 15/01/2019 Shellfish
6 Ernie 20/01/2019 A pet dog
7 Ernie 22/01/2019 Yoghurt
8 Steven 01/01/2019 A golden toilet
9 Steven 15/01/2019 Dominoes

Item History
0 NaN
1 [Bread]
2 [Bread, Cheese]
3 [Bread, Cheese, Apples]
4 NaN
5 [Buzz Lightyear]
6 [Buzz Lightyear, Shellfish]
7 [Buzz Lightyear, Shellfish, A pet dog]
8 NaN
9 [A golden toilet]

关于python - 连接 Pandas 数据框中较早行的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57974133/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com