gpt4 book ai didi

Python GroupBy 时间间隔

转载 作者:行者123 更新时间:2023-11-28 21:40:51 25 4
gpt4 key购买 nike

我想请您帮忙对 pandas 数据框执行操作。

我的初始数据框如下:

enter image description here

我想在 30 秒的间隔内 reshape 它并计算每组的平均值。

我使用了以下内容:

 df_a['avgValue']= df_a['value'].groupby([df_a['id_A'],df_a['course'], pd.TimeGrouper(freq='30S')]).transform(np.mean)

我得到以下信息:

enter image description here

但是 ts_A 没有按 30 秒分组。例如,第一行和第二行应该合并为一个,得到预期的结果:

id_A               ts_A course weight avgValue
id1 2017-04-27 01:35:30 cotton 3.5 150.000
id1 2017-04-27 01:36:00 cotton 3.5 416.000
...

我的问题是:我应该如何修改上面的代码才能得到预期的结果?

非常感谢。此致,卡罗

最佳答案

我认为您需要(假设 ts_A 设置为 DatetimeIndex)GroupBy.mean并省略 transform 函数:

#if not datetimeindex
#df['ts_A'] = pd.to_datetime(df['ts_A'])
#df = df.set_index('ts_A')


df = df_a['value'].groupby([df_a['id_A'],
df_a['course'],
df_a['weight'],
pd.TimeGrouper(freq='30S')]).mean().reset_index()

或者:

df = df_a.groupby(['id_A','course','weight', 
pd.TimeGrouper(freq='30S')])['value'].mean().reset_index()
print (df)
id_A course weight ts_A value
0 id1 cotton 3.5 2017-04-27 01:35:30 150.000000
1 id1 cotton 3.5 2017-04-27 01:36:00 416.666667
2 id1 cotton 3.5 2017-04-27 01:36:30 700.000000
3 id1 cotton 3.5 2017-04-27 01:37:00 950.000000
4 id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000
5 id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000
6 id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667
7 id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000

resample 的解决方案:

df = df_a.groupby(['id_A','course','weight'])['value'].resample('30S').mean().reset_index()
print (df)
id_A course weight ts_A value
0 id1 cotton 3.5 2017-04-27 01:35:30 150.000000
1 id1 cotton 3.5 2017-04-27 01:36:00 416.666667
2 id1 cotton 3.5 2017-04-27 01:36:30 700.000000
3 id1 cotton 3.5 2017-04-27 01:37:00 950.000000
4 id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000
5 id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000
6 id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667
7 id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000

设置:

d = {'weight': {0: 3.5, 1: 3.5, 2: 3.5, 3: 3.5, 4: 3.5, 5: 3.5, 6: 3.5, 7: 3.5, 8: 3.5, 9: 3.5, 10: 5.0, 11: 5.0, 12: 5.0, 13: 5.0, 14: 5.0, 15: 5.0, 16: 5.0, 17: 5.0, 18: 5.0, 19: 5.0}, 'value': {0: 100, 1: 200, 2: 350, 3: 400, 4: 500, 5: 600, 6: 700, 7: 800, 8: 900, 9: 1000, 10: 100, 11: 200, 12: 450, 13: 300, 14: 600, 15: 500, 16: 522, 17: 540, 18: 320, 19: 900}, 'ts_A': {0: '2017-04-27 01:35:40', 1: '2017-04-27 01:35:50', 2: '2017-04-27 01:36:00', 3: '2017-04-27 01:36:10', 4: '2017-04-27 01:36:20', 5: '2017-04-27 01:36:30', 6: '2017-04-27 01:36:40', 7: '2017-04-27 01:36:50', 8: '2017-04-27 01:37:00', 9: '2017-04-27 01:37:10', 10: '2017-04-27 02:35:40', 11: '2017-04-27 02:35:50', 12: '2017-04-27 02:36:00', 13: '2017-04-27 02:36:10', 14: '2017-04-27 02:36:20', 15: '2017-04-27 02:36:30', 16: '2017-04-27 02:36:40', 17: '2017-04-27 02:36:50', 18: '2017-04-27 02:37:00', 19: '2017-04-27 02:37:10'}, 'course': {0: 'cotton', 1: 'cotton', 2: 'cotton', 3: 'cotton', 4: 'cotton', 5: 'cotton', 6: 'cotton', 7: 'cotton', 8: 'cotton', 9: 'cotton', 10: 'cotton blue', 11: 'cotton blue', 12: 'cotton blue', 13: 'cotton blue', 14: 'cotton blue', 15: 'cotton blue', 16: 'cotton blue', 17: 'cotton blue', 18: 'cotton blue', 19: 'cotton blue'}, 'id_A': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1', 6: 'id1', 7: 'id1', 8: 'id1', 9: 'id1', 10: 'id2', 11: 'id2', 12: 'id2', 13: 'id2', 14: 'id2', 15: 'id2', 16: 'id2', 17: 'id2', 18: 'id2', 19: 'id2'}}
df_a = pd.DataFrame(d)
df_a['ts_A'] = pd.to_datetime(df_a['ts_A'])
df_a = df_a.set_index('ts_A')
print (df_a)
course id_A value weight
ts_A
2017-04-27 01:35:40 cotton id1 100 3.5
2017-04-27 01:35:50 cotton id1 200 3.5
2017-04-27 01:36:00 cotton id1 350 3.5
2017-04-27 01:36:10 cotton id1 400 3.5
2017-04-27 01:36:20 cotton id1 500 3.5
2017-04-27 01:36:30 cotton id1 600 3.5
2017-04-27 01:36:40 cotton id1 700 3.5
2017-04-27 01:36:50 cotton id1 800 3.5
2017-04-27 01:37:00 cotton id1 900 3.5
2017-04-27 01:37:10 cotton id1 1000 3.5
2017-04-27 02:35:40 cotton blue id2 100 5.0
2017-04-27 02:35:50 cotton blue id2 200 5.0
2017-04-27 02:36:00 cotton blue id2 450 5.0
2017-04-27 02:36:10 cotton blue id2 300 5.0
2017-04-27 02:36:20 cotton blue id2 600 5.0
2017-04-27 02:36:30 cotton blue id2 500 5.0
2017-04-27 02:36:40 cotton blue id2 522 5.0
2017-04-27 02:36:50 cotton blue id2 540 5.0
2017-04-27 02:37:00 cotton blue id2 320 5.0
2017-04-27 02:37:10 cotton blue id2 900 5.0

关于Python GroupBy 时间间隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45215151/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com