gpt4 book ai didi

python - 鉴于要分发的值是非数字的,如何将数据帧从长更改为宽?

转载 作者:行者123 更新时间:2023-12-04 07:48:16 25 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





Combination of Pivot and Transpose in Python

(2 个回答)


去年关闭。




您好,从 R 迁移到 python,我想弄清楚如何将 Long 格式更改为宽格式,因为要分发的值是字符串或非数字

import pandas as pd
import numpy as np
示例虚拟数据
dummy_data = {'process_id':  [1,1,1,2,3,4,5,5,6],

'purchase_date': ['2020-01-01',
'2020-01-01',
'2020-01-01',
'2020-03-01',
'2020-03-02',
'2020-03-04',
'2020-07-04',
'2020-07-04',
'2020-01-20'],
'consumption_date': ['2021-02-01',
'2021-02-01',
'2021-02-01',
'2021-03-01',
'2021-04-02',
'2021-05-04',
'2021-09-04',
'2021-09-04',
'2021-08-04'],
'discount_code': ['10',
'12',
'11',
'13',
'90',
'81',
'82',
'11',
'10']
}

df = pd.DataFrame (dummy_data, columns = ['process_id','purchase_date','consumption_date','discount_code'])
虚拟数据输出示例:
   process_id purchase_date consumption_date discount_code

0 1 2020-01-01 2021-02-01 10
1 1 2020-01-01 2021-02-01 12
2 1 2020-01-01 2021-02-01 11
3 2 2020-03-01 2021-03-01 13
4 3 2020-03-02 2021-04-02 90
5 4 2020-03-04 2021-05-04 81
6 5 2020-07-04 2021-09-04 82
7 5 2020-07-04 2021-09-04 11
8 6 2020-01-20 2021-08-04 10
所以每个 process_id 可能有 N 个 discount_code
我想要实现的目标:如果一个process_id有多个discount_code,我想创建N个discount_code_columns
示例预期数据
expected_data = {'process_id':  [1,2,3,4,5,6],

'purchase_date': ['2020-01-01',
'2020-03-01',
'2020-03-02',
'2020-03-04',
'2020-07-04',
'2020-01-20'],
'consumption_date': ['2021-02-01',
'2021-03-01',
'2021-04-02',
'2021-05-04',
'2021-09-04',
'2021-08-04'],
'discount_code_1': ['10', '13', '90', '81', ' 82', '10'],
'discount_code_2': ['12', np.nan, np.nan, np.nan, '11', np.nan,],
'discount_code_3': ['11', np.nan, np.nan, np.nan, np.nan, np.nan],
}
df2 = pd.DataFrame (expected_data, columns = ['process_id','purchase_date','consumption_date','discount_code_1','discount_code_2','discount_code_3'])
示例预期数据输出
   process_id purchase_date consumption_date discount_code_1 discount_code_2 discount_code_3
0 1 2020-01-01 2021-02-01 10 12 11
1 2 2020-03-01 2021-03-01 13 NaN NaN
2 3 2020-03-02 2021-04-02 90 NaN NaN
3 4 2020-03-04 2021-05-04 81 NaN NaN
4 5 2020-07-04 2021-09-04 82 11 NaN
5 6 2020-01-20 2021-08-04 10 NaN NaN
所以我尝试的是用户 pandas pivot
df1 = df.pivot_table(index=['process_id','purchase_date','consumption_date'], 
columns='process_id', values=['discount_code'], aggfunc='first')
df1.columns = df1.columns.droplevel()
df1 = df1.reset_index()
df1.columns=df1.columns.tolist()


process_id purchase_date consumption_date 1 2 3 4 5 6
0 1 2020-01-01 2021-02-01 10 NaN NaN NaN NaN NaN
1 2 2020-03-01 2021-03-01 NaN 13 NaN NaN NaN NaN
2 3 2020-03-02 2021-04-02 NaN NaN 90 NaN NaN NaN
3 4 2020-03-04 2021-05-04 NaN NaN NaN 81 NaN NaN
4 5 2020-07-04 2021-09-04 NaN NaN NaN NaN 82 NaN
5 6 2020-01-20 2021-08-04 NaN NaN NaN NaN NaN 10
但这似乎并不正确。感谢任何帮助!

最佳答案

让我们试试unstack在创建 MultiIndex 后 reshape 唯一标识行

c = ['process_id', 'purchase_date', 'consumption_date']
df1 = df.set_index([*c, df.groupby(c).cumcount().add(1).astype(str)]).unstack()
df1.columns = df1.columns.map('_'.join)
>>> df1.reset_index()

process_id purchase_date consumption_date discount_code_1 discount_code_2 discount_code_3
0 1 2020-01-01 2021-02-01 10 12 11
1 2 2020-03-01 2021-03-01 13 NaN NaN
2 3 2020-03-02 2021-04-02 90 NaN NaN
3 4 2020-03-04 2021-05-04 81 NaN NaN
4 5 2020-07-04 2021-09-04 82 11 NaN
5 6 2020-01-20 2021-08-04 10 NaN NaN

关于python - 鉴于要分发的值是非数字的,如何将数据帧从长更改为宽?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67106233/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com