gpt4 book ai didi

python - 使用日期作为列值 reshape 数据

转载 作者:行者123 更新时间:2023-12-02 02:53:05 26 4
gpt4 key购买 nike

我正在尝试使用 pandas reshape 数据,但一直很难将其转换为正确的格式。粗略地说,数据如下所示*:

df = pd.DataFrame({'PRODUCT':['1','2'],
'DESIGN_START':[pd.Timestamp('2020-01-05'),pd.Timestamp('2020-01-17')],
'DESIGN_COMPLETE':[pd.Timestamp('2020-01-22'),pd.Timestamp('2020-03-04')],
'PRODUCTION_START':[pd.Timestamp('2020-02-07'),pd.Timestamp('2020-03-15')],
'PRODUCTION_COMPLETE':[np.nan,pd.Timestamp('2020-04-28')]})
print(df)

PRODUCT DESIGN_START DESIGN_COMPLETE PRODUCTION_START PRODUCTION_COMPLETE
0 1 2020-01-05 2020-01-22 2020-02-07 NaT
1 2 2020-01-17 2020-03-04 2020-03-15 2020-04-28

我想 reshape 数据,使其看起来像这样:

reshaped_df = pd.DataFrame({'DATE':[pd.Timestamp('2020-01-05'),pd.Timestamp('2020-01-17'),
pd.Timestamp('2020-01-22'),pd.Timestamp('2020-03-04'),
pd.Timestamp('2020-02-07'),pd.Timestamp('2020-03-15'),
np.nan,pd.Timestamp('2020-04-28')],
'STAGE':['design','design','design','design','production','production','production','production'],
'STATUS':['started','started','completed','completed','started','started','completed','completed']})

print(reshaped_df)

DATE STAGE STATUS
0 2020-01-05 design started
1 2020-01-17 design started
2 2020-01-22 design completed
3 2020-03-04 design completed
4 2020-02-07 production started
5 2020-03-15 production started
6 NaT production completed
7 2020-04-28 production completed

我怎样才能做到这一点?有没有更好的格式可以将其 reshape 为?

最终我想对数据进行一些分组汇总,例如每个步骤发生的次数,例如

reshaped_df.groupby(['STAGE','STATUS'])['DATE'].count()

STAGE STATUS
design completed 2
started 2
production completed 1
started 2
Name: DATE, dtype: int64

谢谢

  • 数据实际上包含制造流程不同阶段的许多日期开始/停止列

最佳答案

融化它!!!

import pandas as pd
import numpy as np

df = pd.DataFrame({
'PRODUCT':['1','2'],
'DESIGN_START':[pd.Timestamp('2020-01-05'),pd.Timestamp('2020-01-17')],
'DESIGN_COMPLETE':[pd.Timestamp('2020-01-22'),pd.Timestamp('2020-03-04')],
'PRODUCTION_START':[pd.Timestamp('2020-02-07'),pd.Timestamp('2020-03-15')],
'PRODUCTION_COMPLETE':[np.nan,pd.Timestamp('2020-04-28')]
})

df = df.melt(id_vars=['PRODUCT'])
df_split = df['variable'].str.split('_', n=1, expand=True)
df['STAGE'] = df_split[0]
df['STATUS'] = df_split[1]
df.drop(columns=['variable'], inplace=True)
df = df.rename(columns={'value': 'DATE'})

print(df)

输出:

  PRODUCT       DATE       STAGE    STATUS
0 1 2020-01-05 DESIGN START
1 2 2020-01-17 DESIGN START
2 1 2020-01-22 DESIGN COMPLETE
3 2 2020-03-04 DESIGN COMPLETE
4 1 2020-02-07 PRODUCTION START
5 2 2020-03-15 PRODUCTION START
6 1 NaT PRODUCTION COMPLETE
7 2 2020-04-28 PRODUCTION COMPLETE

哇哈哈哈哈哈!!!感受融化的力量!!!

Melt 基本上是不可旋转的

关于python - 使用日期作为列值 reshape 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61603407/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com