gpt4 book ai didi

python - Pandas 使用开始和结束时间对行重新采样

转载 作者:行者123 更新时间:2023-12-05 05:49:48 24 4
gpt4 key购买 nike

我有以下形式的数据框:

In [5]: df = pd.DataFrame({
...: 'start_time': ['2022-01-01 01:15', '2022-01-01 13:00'],
...: 'end_time': ['2022-01-01 03:45', '2022-01-01 15:00'],
...: 'values': [1000, 750]})

In [6]: df
Out[6]:
start_time end_time values
0 2022-01-01 01:15 2022-01-01 03:45 1000
1 2022-01-01 13:00 2022-01-01 15:00 750

我想将其转换为 24 小时值,在 start_time/end_time 范围内的小时内按比例拆分值。对于上面的例子,这应该产生:

In [10]: result
Out[10]:
value
2022-01-01 00:00:00 0
2022-01-01 01:00:00 300
2022-01-01 02:00:00 400
2022-01-01 03:00:00 300
2022-01-01 04:00:00 0
2022-01-01 05:00:00 0
2022-01-01 06:00:00 0
2022-01-01 07:00:00 0
2022-01-01 08:00:00 0
2022-01-01 09:00:00 0
2022-01-01 10:00:00 0
2022-01-01 11:00:00 0
2022-01-01 12:00:00 0
2022-01-01 13:00:00 375
2022-01-01 14:00:00 375
2022-01-01 15:00:00 0
2022-01-01 16:00:00 0
2022-01-01 17:00:00 0
2022-01-01 18:00:00 0
2022-01-01 19:00:00 0
2022-01-01 20:00:00 0
2022-01-01 21:00:00 0
2022-01-01 22:00:00 0
2022-01-01 23:00:00 0

start_time/end_time 范围不重叠。关于如何实现这一点有什么建议吗?

最佳答案

使用:

#get differencies between start and end in minutes
df['diff'] = pd.to_datetime(df['end_time']).sub(pd.to_datetime(df['start_time'])).dt.total_seconds().div(60)

#create DataFrame with repeat values by minutes
s = pd.concat([pd.Series(r.Index,pd.date_range(r.start_time, r.end_time, freq='Min', closed='left')) for r in df.itertuples()])
s = pd.Series(s.index, s.to_numpy(), name='new')
df = df.join(s)

#resample to hours
df = df.resample('H', on='new').agg({'values':'first', 'diff':'first', 'new':'size'})
#multiple values by ratio
df['value'] = df['values'].mul(df['new'].div(df['diff'])).fillna(0)

#add missing rows
r = pd.date_range(df.index.min().normalize(), df.index.max().normalize() + pd.Timedelta('23H'), freq='H')
df = df[['value']].reindex(r, fill_value=0)

print (df)
value
2022-01-01 00:00:00 0.0
2022-01-01 01:00:00 300.0
2022-01-01 02:00:00 400.0
2022-01-01 03:00:00 300.0
2022-01-01 04:00:00 0.0
2022-01-01 05:00:00 0.0
2022-01-01 06:00:00 0.0
2022-01-01 07:00:00 0.0
2022-01-01 08:00:00 0.0
2022-01-01 09:00:00 0.0
2022-01-01 10:00:00 0.0
2022-01-01 11:00:00 0.0
2022-01-01 12:00:00 0.0
2022-01-01 13:00:00 375.0
2022-01-01 14:00:00 375.0
2022-01-01 15:00:00 0.0
2022-01-01 16:00:00 0.0
2022-01-01 17:00:00 0.0
2022-01-01 18:00:00 0.0
2022-01-01 19:00:00 0.0
2022-01-01 20:00:00 0.0
2022-01-01 21:00:00 0.0
2022-01-01 22:00:00 0.0
2022-01-01 23:00:00 0.0

关于python - Pandas 使用开始和结束时间对行重新采样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70620304/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com