gpt4 book ai didi

python - Pandas :日期时间的复杂条件

转载 作者:太空宇宙 更新时间:2023-11-03 10:54:31 25 4
gpt4 key购买 nike

我有一个包含日期时间类型列和浮点类型列的数据框。

                     date  value0     2010-01-01 01:23:00  21.21     2010-01-02 01:33:00  63.42     2010-01-03 06:02:00  80.63     2010-01-04 06:05:00  50.14     2010-01-05 06:20:00  346.55     2010-01-06 07:44:00  111.86     2010-01-07 08:00:00  113.17     2010-01-08 08:22:00  10.68     2010-01-09 09:00:00  287.29     2010-01-10 09:14:00  1652.6

I want to create a new column to record the mean value of one hours before the current iteration row time.

[UPDATE] Example:

If the current iteration is 4 2010-01-05 06:20:00 346.5 , I need to calculate (50.1 + 80.6) / 2 (value in range 2010-01-05 05:20:00~2010-01-05 06:20:00 and calculate mean).

                     date  value  before_1hr_mean4     2010-01-05 06:20:00  346.5  65.35

I use iterrows() to solve this problem like the following code. But this method is really slow and the function iterrows() is usually not recommended in pandas and this row will become as

[UPDATE]

df['before_1hr_mean'] = np.nan
for index, row in df.iterrows():
df.loc[index, 'before_1hr_mean'] = df[(df['date'] < row['date']) & \
(df['date'] >= row['date'] - pd.Timedelta(hours=1))]['value'].mean()

有没有更好的方法来处理这种情况?

最佳答案

我冒昧地更改了您的数据,以便在同一天完成。这是我理解您的问题的唯一方法。

df.join(
df.set_index('date').value.rolling('H').mean().rename('before_1hr_mean'),
on='date'
)

date value before_1hr_mean
0 2010-01-01 01:23:00 21.2 21.200000
1 2010-01-01 01:33:00 63.4 42.300000
2 2010-01-01 06:02:00 80.6 80.600000
3 2010-01-01 06:05:00 50.1 65.350000
4 2010-01-01 06:20:00 346.5 159.066667
5 2010-01-01 07:44:00 111.8 111.800000
6 2010-01-01 08:00:00 113.1 112.450000
7 2010-01-01 08:22:00 10.6 78.500000
8 2010-01-01 09:00:00 287.2 148.900000
9 2010-01-01 09:14:00 1652.6 650.133333

如果要排除当前行,则必须跟踪滚动小时的总和和计数,并在针对当前值进行调整后返回平均值。

s = df.set_index('date')
sagg = s.rolling('H').agg(['sum', 'count']).value.rename(columns=str.title)
agged = df.join(sagg, on='date')
agged

date value Sum Count
0 2010-01-01 01:23:00 21.2 21.2 1.0
1 2010-01-01 01:33:00 63.4 84.6 2.0
2 2010-01-01 06:02:00 80.6 80.6 1.0
3 2010-01-01 06:05:00 50.1 130.7 2.0
4 2010-01-01 06:20:00 346.5 477.2 3.0
5 2010-01-01 07:44:00 111.8 111.8 1.0
6 2010-01-01 08:00:00 113.1 224.9 2.0
7 2010-01-01 08:22:00 10.6 235.5 3.0
8 2010-01-01 09:00:00 287.2 297.8 2.0
9 2010-01-01 09:14:00 1652.6 1950.4 3.0

然后做一些数学运算并分配一个新列

df.assign(before_1hr_mean=agged.eval('(Sum - value) / (Count - 1)'))

date value before_1hr_mean
0 2010-01-01 01:23:00 21.2 NaN
1 2010-01-01 01:33:00 63.4 21.20
2 2010-01-01 06:02:00 80.6 NaN
3 2010-01-01 06:05:00 50.1 80.60
4 2010-01-01 06:20:00 346.5 65.35
5 2010-01-01 07:44:00 111.8 NaN
6 2010-01-01 08:00:00 113.1 111.80
7 2010-01-01 08:22:00 10.6 112.45
8 2010-01-01 09:00:00 287.2 10.60
9 2010-01-01 09:14:00 1652.6 148.90

请注意,当没有一个小时的先前数据可供计算时,您会得到空值。

关于python - Pandas :日期时间的复杂条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43648053/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com