gpt4 book ai didi

python-3.x - Pandas中基于groupby的数据透视表

转载 作者:行者123 更新时间:2023-12-04 02:58:28 24 4
gpt4 key购买 nike

我有一个这样的数据框:

customer_id | date     | category
1 | 2017-2-1 | toys
2 | 2017-2-1 | food
1 | 2017-2-1 | drinks
3 | 2017-2-2 | computer
2 | 2017-2-1 | toys
1 | 2017-3-1 | food

>>> import pandas as pd
>>> dt = dict(customer_id=[1,2,1,3,2,1],
date='2017-2-1 2017-2-1 2017-2-1 2017-2-2 2017-2-1 2017-3-1'.split(),
category=["toys", "food", "drinks", "computer", "toys", "food"]))
>>> df = pd.DataFrame(dt)
使用我的新列和一个热编码这些列,我知道我可以使用 df.pivot_table(index = ['customer_id'], columns = ['category']) .
>>> df['Indicator'] = 1 
>>> df.pivot_table(index=['customer_id'], columns=['category'],
values='Indicator').fillna(0).astype(int)
category computer drinks food toys
customer_id
1 0 1 1 1
2 0 0 1 1
3 1 0 0 0
>>>
我也想按 date分组所以每一行只包含来自同一日期的信息,就像下面所需的输出一样,id 1 有两行,因为 date 中有两个唯一的日期柱子。
customer_id | toys | food | drinks | computer 
1 | 1 | 0 | 1 | 0
1 | 0 | 1 | 0 | 0
2 | 1 | 1 | 0 | 0
3 | 0 | 0 | 0 | 1

最佳答案

您可能正在寻找 crosstab

>>> pd.crosstab([df.customer_id,df.date], df.category)                                                                                                                
category computer drinks food toys
customer_id date
1 2017-2-1 0 1 0 1
2017-3-1 0 0 1 0
2 2017-2-1 0 0 1 1
3 2017-2-2 1 0 0 0
>>>
>>> pd.crosstab([df.customer_id,df.date],
df.category).reset_index(level=1)
category date computer drinks food toys
customer_id
1 2017-2-1 0 1 0 1
1 2017-3-1 0 0 1 0
2 2017-2-1 0 0 1 1
3 2017-2-2 1 0 0 0
>>>
>>> pd.crosstab([df.customer_id, df.date],
df.category).reset_index(level=1, drop=True)
category computer drinks food toys
customer_id
1 0 1 0 1
1 0 0 1 0
2 0 0 1 1
3 1 0 0 0
>>>

关于python-3.x - Pandas中基于groupby的数据透视表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51541995/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com