gpt4 book ai didi

python - 根据 Pandas 数据框中的列值获取更改日期

转载 作者:太空宇宙 更新时间:2023-11-03 13:25:13 25 4
gpt4 key购买 nike

我有以下数据框:

fid         date       stage
test_fid 4/22/2019 a1
test_fid 4/23/2019 a1
test_fid 4/24/2019 a2
test_fid 4/25/2019 a2
test_fid 4/26/2019 a2
test_fid 4/27/2019 a3
test_fid 4/28/2019 a3
test_fid 4/29/2019 a3
test_fid1 4/30/2019 a1
test_fid1 5/1/2019 a1
test_fid1 5/2/2019 a1
test_fid1 5/3/2019 a1
test_fid1 5/4/2019 a2
test_fid1 5/5/2019 a2
test_fid1 5/6/2019 a2
test_fid1 5/7/2019 a2
test_fid1 5/8/2019 a3
test_fid1 5/9/2019 a3
test_fid1 5/10/2019 a3

我想确定阶段列值开始和结束的日期,例如test_fid 的阶段 a1 从 4/22/2019 到 4/23/2019。结果应如下所示:

fid        stage    start_date  end_date
test_fid a1 4/22/2019 4/23/2019
test_fid a2 4/24/2019 4/26/2019
test_fid a3 4/27/2019 4/29/2019
test_fid1 a1 4/30/2019 5/3/2019
test_fid1 a2 5/4/2019 5/7/2019
test_fid1 a3 5/8/2019 5/10/2019

我试过这个:

df['stage_change'] = df['stage'].diff()
df_filtered = df[df['stage_change'] != 0]

最佳答案

您可能忘记了将date 列解析为日期对象,您可以这样做,例如@pythonic。说:

df['date'] = pd.to_datetime(df['date'])

可能最可靠的方法是计算每个组的 date 的最小值和最大值,例如:

>>> df.groupby(['fid', 'stage'])['date'].agg({'start_date': 'min', 'end_date':'max'})
start_date end_date
fid stage
test_fid a1 4/22/2019 4/23/2019
a2 4/24/2019 4/26/2019
a3 4/27/2019 4/29/2019
test_fid1 a1 4/30/2019 5/3/2019
a2 5/4/2019 5/7/2019
a3 5/10/2019 5/9/2019

或者如果你不想使用fidstage作为索引,你可以重新设置索引:

>>> df.groupby(['fid', 'stage'])['date'].agg({'start_date': 'min', 'end_date':'max'}).reset_index()
fid stage start_date end_date
0 test_fid a1 4/22/2019 4/23/2019
1 test_fid a2 4/24/2019 4/26/2019
2 test_fid a3 4/27/2019 4/29/2019
3 test_fid1 a1 4/30/2019 5/3/2019
4 test_fid1 a2 5/4/2019 5/7/2019
5 test_fid1 a3 5/10/2019 5/9/2019

关于python - 根据 Pandas 数据框中的列值获取更改日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57548443/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com