gpt4 book ai didi

python - 使用 pd.Grouper() 获取第一个和最后一个元素

转载 作者:行者123 更新时间:2023-12-01 01:55:12 26 4
gpt4 key购买 nike

我有一个时间序列,我正在将其重新采样为 5s 窗口,如下所示:

INDEX                   size           price
2018-05-07 21:53:13.731 0.365127 9391.800000
2018-05-07 21:53:16.201 0.666127 9391.800000
2018-05-07 21:53:18.038 0.143104 9391.800000
2018-05-07 21:53:18.243 0.025643 9391.800000
2018-05-07 21:53:18.265 0.640484 9391.800000
2018-05-07 21:53:18.906 -0.100000 9391.793421
2018-05-07 21:53:19.829 0.559516 9391.800000
2018-05-07 21:53:19.846 0.100000 9391.800000
2018-05-07 21:53:19.870 0.006560 9391.800000
2018-05-07 21:53:20.734 0.666076 9391.800000
2018-05-07 21:53:20.775 0.666076 9391.800000
2018-05-07 21:53:28.607 0.100000 9391.800000
2018-05-07 21:53:28.610 0.041991 9391.800000
2018-05-07 21:53:29.283 -0.053518 9391.793421
2018-05-07 21:53:47.322 -0.046302 9391.793421
2018-05-07 21:53:49.182 0.100000 9391.800000

def tick_features(x):
volume = np.abs(x['size']).sum()
num_trades = x['size'].count()
return pd.Series([volume,num_trades], index=['volume','num_trades'])


tick = tick.groupby(pd.Grouper(freq='5S')).apply(tick_features)

如何通过 pd.Grouper().apply() 获取每个 5S 的第一个和最后一个元素?

我可以使用 .resample().agg(){'price':'first'} 做类似的事情,但由于其他原因我想如果可能的话,通过 pd.Grouper() 来完成。

最佳答案

我建议使用DataFrameGroupBy.agg包含元组和函数 firstlast 列表:

tick_features = [('volume', lambda x: x.abs().sum()),
('num_trades', 'count'),
('first_trade', 'first'),
('last_trade', 'last')]
tick = tick.groupby(pd.Grouper(freq='5S'))['size'].agg(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0 NaN NaN
2018-05-07 21:53:35 0.000000 0 NaN NaN
2018-05-07 21:53:40 0.000000 0 NaN NaN
2018-05-07 21:53:45 0.146302 2 -0.046302 0.100000

apply解决方案是可能的,但需要if-else语句:

def tick_features(x):
volume = np.abs(x['size']).sum()
num_trades = x['size'].count()
if not x.empty:
f = x['size'].iloc[0]
l = x['size'].iloc[-1]
else:
f = np.nan
l = np.nan
return pd.Series([volume,num_trades, f, l],
index=['volume','num_trades', 'first_trade', 'last_trade'])


tick = tick.groupby(pd.Grouper(freq='5S')).apply(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1.0 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8.0 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2.0 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3.0 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0.0 NaN NaN
2018-05-07 21:53:35 0.000000 0.0 NaN NaN
2018-05-07 21:53:40 0.000000 0.0 NaN NaN
2018-05-07 21:53:45 0.146302 2.0 -0.046302 0.100000

关于python - 使用 pd.Grouper() 获取第一个和最后一个元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50286010/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com