gpt4 book ai didi

python - 根据 pandas 中一列的值从行创建列

转载 作者:太空宇宙 更新时间:2023-11-03 15:03:57 27 4
gpt4 key购买 nike

我有一个如下所示的数据框:

PERIOD_START_TIME       ID    temp_ID  value1  value2
06.28.2017 22:00:00 88 1 4 2
06.28.2017 22:00:00 88 2 0 7
06.28.2017 22:00:00 89 2 0 9
06.28.2017 22:00:00 89 1 5 4
06.28.2017 22:00:00 90 1 12 13
06.28.2017 22:00:00 90 2 18 4

现在我需要删除一半的行,但再获得两倍的列。实际上,双列并将 temp_ID 分配给列名。简单来说,temp_id 从行转换为列。

期望的输出

PERIOD_START_TIME    ID  value1_tpID1 vauel1_tpID2  vauel2_tpID1 value2_tpID2
06.28.2017 22:00:00 88 4 0 2 7
06.28.2017 22:00:00 89 5 0 4 9
06.28.2017 22:00:00 90 12 18 13 4

<class 'pandas.core.frame.DataFrame'>
Int64Index: 189604 entries, 0 to 10595
Data columns (total 12 columns):
PERIOD_START_TIME 189604 non-null object
ID 189604 non-null int64
temp_ID 189604 non-null int64
dtypes: float64(4), int64(6), object(2)
memory usage: 18.8+ MB

最佳答案

您可以使用set_indexunstack :

#if necessary convert to str
df['temp_ID'] = df['temp_ID'].astype(str)
df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4 0 2 7
1 06.28.2017 22:00:00 89 5 0 4 9
2 06.28.2017 22:00:00 90 12 18 13 4

或者:

df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4 0 2 7
1 06.28.2017 22:00:00 89 5 0 4 9
2 06.28.2017 22:00:00 90 12 18 13 4

如果重复的三元组 PERIOD_START_TIMEIDtemp_ID 则需要 pivot_table使用一些聚合函数,例如 meansum...:

print (df)
PERIOD_START_TIME ID temp_ID value1 value2
0 06.28.2017 22:00:00 88 1 4 2 < same PERIOD_START_TIME ID temp_ID
1 06.28.2017 22:00:00 88 1 5 3 < same PERIOD_START_TIME ID temp_ID
2 06.28.2017 22:00:00 88 2 0 7
3 06.28.2017 22:00:00 89 2 0 9
4 06.28.2017 22:00:00 89 1 5 4
5 06.28.2017 22:00:00 90 1 12 13
6 06.28.2017 22:00:00 90 2 18 4

df = df.pivot_table(index=['PERIOD_START_TIME','ID'],
columns='temp_ID',
values=['value1','value2'],
aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4.5 0.0 2.5 7.0
1 06.28.2017 22:00:00 89 5.0 0.0 4.0 9.0
2 06.28.2017 22:00:00 90 12.0 18.0 13.0 4.0

替代解决方案:

df = df.groupby(['PERIOD_START_TIME','ID','temp_ID']).mean().unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4.5 0.0 2.5 7.0
1 06.28.2017 22:00:00 89 5.0 0.0 4.0 9.0
2 06.28.2017 22:00:00 90 12.0 18.0 13.0 4.0

关于python - 根据 pandas 中一列的值从行创建列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44822514/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com