gpt4 book ai didi

python - Pandas 按大于 N 分钟的时间增量进行分组

转载 作者:太空宇宙 更新时间:2023-11-03 15:49:01 24 4
gpt4 key购买 nike

我有一个数据框,其中包含时间标签、卫星 ID 和站点 ID 的列。我的目标是将数据集分解为单独的“轨道”,其中每个“轨道”都是卫星和站点 ID 的唯一组合。我可以使用标准 pandas groupby 功能并指定 by=['site', 'sat'] 轻松完成此操作。但进一步需要注意的是,如果一组内存在超过N分钟的时间间隙,那么该时间间隙之后的数据应该成为新的“轨迹”。

我的问题是计算(站点、卫星)组内连续行之间的时间增量、确定时间增量何时大于 N 分钟并创建新组/轨道的最佳方法是什么?

我想我可以使用 diff() 方法计算行之间的时间增量。理想情况下,有一种方法可以向我的 groupby 调用添加第三个键,该键封装了我正在使用的时间限制。

这里是一些示例代码,用于生成测试数据集并进行初始站点、sat 分组。

import pandas as pd
import numpy as np

# Create first sample set.

N=10
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
A_data = np.arange(0, N)
A_site = ['X'] * N
A_sat = 12345

# Create second sample set over the same time span but with a different sat

N=5
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
B_data = np.arange(0, N)
B_site = ['X'] * N
B_sat = 3456


# Create a third sample set with a new site over the same time span but the
# same sat as the second set

N = 10
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s')
C_data = np.arange(0, N)
C_site = ['Y'] * N
C_sat = 3456

# Create a fourth sample set with the same sat and site as the third set but
# more than 20 minutes after the third set.

N = 5
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s')
D_data = np.arange(0, N)
D_site = ['Y'] * N
D_sat = 3456

# Build a data frame for each sample set

A = pd.DataFrame(index=A_times, data={'data': A_data, 'site' : A_site, 'sat' : A_sat})
B = pd.DataFrame(index=B_times, data={'data': B_data, 'site' : B_site, 'sat' : B_sat})
C = pd.DataFrame(index=C_times, data={'data': C_data, 'site' : C_site, 'sat' : C_sat})
D = pd.DataFrame(index=D_times, data={'data': D_data, 'site' : D_site, 'sat' : D_sat})

# mash them into one larger test data frame

test = pd.concat([A, B, C, D])

print(test)

my_groups = test.groupby(by = ['site', 'sat'])

for key, g in my_groups:
print(key)
print(g)

这个的输出是

测试=

                     data    sat site
2016-01-01 00:00:00 0 12345 X
2016-01-01 00:00:01 1 12345 X
2016-01-01 00:00:02 2 12345 X
2016-01-01 00:00:03 3 12345 X
2016-01-01 00:00:04 4 12345 X
2016-01-01 00:00:05 5 12345 X
2016-01-01 00:00:06 6 12345 X
2016-01-01 00:00:07 7 12345 X
2016-01-01 00:00:08 8 12345 X
2016-01-01 00:00:09 9 12345 X
2016-01-01 00:00:00 0 3456 X
2016-01-01 00:00:01 1 3456 X
2016-01-01 00:00:02 2 3456 X
2016-01-01 00:00:03 3 3456 X
2016-01-01 00:00:04 4 3456 X
2016-01-01 00:01:00 0 3456 Y
2016-01-01 00:01:01 1 3456 Y
2016-01-01 00:01:02 2 3456 Y
2016-01-01 00:01:03 3 3456 Y
2016-01-01 00:01:04 4 3456 Y
2016-01-01 00:01:05 5 3456 Y
2016-01-01 00:01:06 6 3456 Y
2016-01-01 00:01:07 7 3456 Y
2016-01-01 00:01:08 8 3456 Y
2016-01-01 00:01:09 9 3456 Y
2016-01-01 01:00:00 0 3456 Y
2016-01-01 01:00:01 1 3456 Y
2016-01-01 01:00:02 2 3456 Y
2016-01-01 01:00:03 3 3456 Y
2016-01-01 01:00:04 4 3456 Y

各个组是

('X', 3456)
data sat site
2016-01-01 00:00:00 0 3456 X
2016-01-01 00:00:01 1 3456 X
2016-01-01 00:00:02 2 3456 X
2016-01-01 00:00:03 3 3456 X
2016-01-01 00:00:04 4 3456 X
('X', 12345)
data sat site
2016-01-01 00:00:00 0 12345 X
2016-01-01 00:00:01 1 12345 X
2016-01-01 00:00:02 2 12345 X
2016-01-01 00:00:03 3 12345 X
2016-01-01 00:00:04 4 12345 X
2016-01-01 00:00:05 5 12345 X
2016-01-01 00:00:06 6 12345 X
2016-01-01 00:00:07 7 12345 X
2016-01-01 00:00:08 8 12345 X
2016-01-01 00:00:09 9 12345 X
('Y', 3456)
data sat site
2016-01-01 00:01:00 0 3456 Y
2016-01-01 00:01:01 1 3456 Y
2016-01-01 00:01:02 2 3456 Y
2016-01-01 00:01:03 3 3456 Y
2016-01-01 00:01:04 4 3456 Y
2016-01-01 00:01:05 5 3456 Y
2016-01-01 00:01:06 6 3456 Y
2016-01-01 00:01:07 7 3456 Y
2016-01-01 00:01:08 8 3456 Y
2016-01-01 00:01:09 9 3456 Y
2016-01-01 01:00:00 0 3456 Y
2016-01-01 01:00:01 1 3456 Y
2016-01-01 01:00:02 2 3456 Y
2016-01-01 01:00:03 3 3456 Y
2016-01-01 01:00:04 4 3456 Y

期望的行为是,由于数据中有 20 分钟的间隙,上面的第三组实际上应该分为两组,例如

('Y', 3456)
data sat site
2016-01-01 00:01:00 0 3456 Y
2016-01-01 00:01:01 1 3456 Y
2016-01-01 00:01:02 2 3456 Y
2016-01-01 00:01:03 3 3456 Y
2016-01-01 00:01:04 4 3456 Y
2016-01-01 00:01:05 5 3456 Y
2016-01-01 00:01:06 6 3456 Y
2016-01-01 00:01:07 7 3456 Y
2016-01-01 00:01:08 8 3456 Y
2016-01-01 00:01:09 9 3456 Y

New group here

2016-01-01 01:00:00 0 3456 Y
2016-01-01 01:00:01 1 3456 Y
2016-01-01 01:00:02 2 3456 Y
2016-01-01 01:00:03 3 3456 Y
2016-01-01 01:00:04 4 3456 Y

如有任何建议,我们将不胜感激。谢谢!

最佳答案

在摆弄 unutbu 的解决方案并将其与 this post 的建议结合起来之后,我能够解决这个问题。下面显示了更完整的示例测试集和解决方案。

# In[222]:

import pandas as pd
import numpy as np


# In[223]:

# Create 1st sample set.

N=10
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
A_data = np.arange(0, N)
A_site = ['X'] * N
A_sat = 12345

# Create 2nd sample set over the same time span but with a different sat

N=5
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
B_data = np.arange(0, N)
B_site = ['X'] * N
B_sat = 3456


# Create a 3rd sample set with a new site over the same time span but the
# same sat as the 2nd set

N = 10
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s')
C_data = np.arange(0, N)
C_site = ['Y'] * N
C_sat = 3456

# Create a 4th sample set with the same sat and site as the 3rd set but
# more than 20 minutes after the third set.

N = 5
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s')
D_data = np.arange(0, N)
D_site = ['Y'] * N
D_sat = 3456

# Create a 5th sample set with the same sat and site as the 3rd set but
# more than 20 minutes after the third set.

N = 60
E_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='60s')
E_data = np.arange(0, N)
E_site = ['Z'] * N
E_sat = 3456

# Create a 6th sample set with the same sat and site as the 4th set but
# more than 20 minutes after the fourth set.

N = 5
F_times = pd.date_range('2016-01-02T00:00:00', periods=N, freq='60s')
F_data = np.arange(0, N)
F_site = ['Y'] * N
F_sat = 3456


# In[224]:

# Build a data frame for each sample set

A = pd.DataFrame(data={'time': A_times, 'data': A_data, 'site' : A_site, 'sat' : A_sat})
B = pd.DataFrame(data={'time': B_times, 'data': B_data, 'site' : B_site, 'sat' : B_sat})
C = pd.DataFrame(data={'time': C_times, 'data': C_data, 'site' : C_site, 'sat' : C_sat})
D = pd.DataFrame(data={'time': D_times, 'data': D_data, 'site' : D_site, 'sat' : D_sat})
E = pd.DataFrame(data={'time': E_times, 'data': E_data, 'site' : E_site, 'sat' : E_sat})
F = pd.DataFrame(data={'time': F_times, 'data': F_data, 'site' : F_site, 'sat' : F_sat})

# mash them into one larger test data frame

test = pd.concat([A, B, C, D, E, F])


# In[225]:

print(test)


# In[226]:

test.sort_values(['time'], inplace=True)


# In[227]:

# This approach doesn't quite work. Note that the group (Y, 3456, 0) really
# has 2 tracks in it because the overlapping track from (Z, 3456, 0) is screwing up
# the delta-t calculation and hiding the fact that within the Y, 3456 group there
# was a large time gap.

test1 = test.copy()
test1['delta_t'] = test1['time'].diff()
test1['track'] = (test1['delta_t'] > pd.Timedelta(minutes=20)).cumsum()

my_groups = test1.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
print(key)
print(g)

其输出如下所示:

('X', 3456, 0)
data sat site time delta_t track
0 0 3456 X 2016-01-01 00:00:00 0 days 0
1 1 3456 X 2016-01-01 00:00:01 0 days 0
2 2 3456 X 2016-01-01 00:00:02 0 days 0
3 3 3456 X 2016-01-01 00:00:03 0 days 0
4 4 3456 X 2016-01-01 00:00:04 0 days 0
('X', 12345, 0)
data sat site time delta_t track
0 0 12345 X 2016-01-01 00:00:00 NaT 0
1 1 12345 X 2016-01-01 00:00:01 00:00:01 0
2 2 12345 X 2016-01-01 00:00:02 00:00:01 0
3 3 12345 X 2016-01-01 00:00:03 00:00:01 0
4 4 12345 X 2016-01-01 00:00:04 00:00:01 0
5 5 12345 X 2016-01-01 00:00:05 00:00:01 0
6 6 12345 X 2016-01-01 00:00:06 00:00:01 0
7 7 12345 X 2016-01-01 00:00:07 00:00:01 0
8 8 12345 X 2016-01-01 00:00:08 00:00:01 0
9 9 12345 X 2016-01-01 00:00:09 00:00:01 0
('Y', 3456, 0)
data sat site time delta_t track
0 0 3456 Y 2016-01-01 00:01:00 00:00:00 0
1 1 3456 Y 2016-01-01 00:01:01 00:00:01 0
2 2 3456 Y 2016-01-01 00:01:02 00:00:01 0
3 3 3456 Y 2016-01-01 00:01:03 00:00:01 0
4 4 3456 Y 2016-01-01 00:01:04 00:00:01 0
5 5 3456 Y 2016-01-01 00:01:05 00:00:01 0
6 6 3456 Y 2016-01-01 00:01:06 00:00:01 0
7 7 3456 Y 2016-01-01 00:01:07 00:00:01 0
8 8 3456 Y 2016-01-01 00:01:08 00:00:01 0
9 9 3456 Y 2016-01-01 00:01:09 00:00:01 0
0 0 3456 Y 2016-01-01 01:00:00 00:01:00 0
1 1 3456 Y 2016-01-01 01:00:01 00:00:01 0
2 2 3456 Y 2016-01-01 01:00:02 00:00:01 0
3 3 3456 Y 2016-01-01 01:00:03 00:00:01 0
4 4 3456 Y 2016-01-01 01:00:04 00:00:01 0
('Y', 3456, 1)
data sat site time delta_t track
0 0 3456 Y 2016-01-02 00:00:00 22:59:56 1
1 1 3456 Y 2016-01-02 00:01:00 00:01:00 1
2 2 3456 Y 2016-01-02 00:02:00 00:01:00 1
3 3 3456 Y 2016-01-02 00:03:00 00:01:00 1
4 4 3456 Y 2016-01-02 00:04:00 00:01:00 1
('Z', 3456, 0)
data sat site time delta_t track
0 0 3456 Z 2016-01-01 00:00:00 00:00:00 0
1 1 3456 Z 2016-01-01 00:01:00 00:00:51 0
2 2 3456 Z 2016-01-01 00:02:00 00:00:51 0
3 3 3456 Z 2016-01-01 00:03:00 00:01:00 0
4 4 3456 Z 2016-01-01 00:04:00 00:01:00 0
5 5 3456 Z 2016-01-01 00:05:00 00:01:00 0
6 6 3456 Z 2016-01-01 00:06:00 00:01:00 0
7 7 3456 Z 2016-01-01 00:07:00 00:01:00 0
8 8 3456 Z 2016-01-01 00:08:00 00:01:00 0
9 9 3456 Z 2016-01-01 00:09:00 00:01:00 0
10 10 3456 Z 2016-01-01 00:10:00 00:01:00 0
11 11 3456 Z 2016-01-01 00:11:00 00:01:00 0
12 12 3456 Z 2016-01-01 00:12:00 00:01:00 0
13 13 3456 Z 2016-01-01 00:13:00 00:01:00 0
14 14 3456 Z 2016-01-01 00:14:00 00:01:00 0
15 15 3456 Z 2016-01-01 00:15:00 00:01:00 0
16 16 3456 Z 2016-01-01 00:16:00 00:01:00 0
17 17 3456 Z 2016-01-01 00:17:00 00:01:00 0
18 18 3456 Z 2016-01-01 00:18:00 00:01:00 0
19 19 3456 Z 2016-01-01 00:19:00 00:01:00 0
20 20 3456 Z 2016-01-01 00:20:00 00:01:00 0
21 21 3456 Z 2016-01-01 00:21:00 00:01:00 0
22 22 3456 Z 2016-01-01 00:22:00 00:01:00 0
23 23 3456 Z 2016-01-01 00:23:00 00:01:00 0
24 24 3456 Z 2016-01-01 00:24:00 00:01:00 0
25 25 3456 Z 2016-01-01 00:25:00 00:01:00 0
26 26 3456 Z 2016-01-01 00:26:00 00:01:00 0
27 27 3456 Z 2016-01-01 00:27:00 00:01:00 0
28 28 3456 Z 2016-01-01 00:28:00 00:01:00 0
29 29 3456 Z 2016-01-01 00:29:00 00:01:00 0
30 30 3456 Z 2016-01-01 00:30:00 00:01:00 0
31 31 3456 Z 2016-01-01 00:31:00 00:01:00 0
32 32 3456 Z 2016-01-01 00:32:00 00:01:00 0
33 33 3456 Z 2016-01-01 00:33:00 00:01:00 0
34 34 3456 Z 2016-01-01 00:34:00 00:01:00 0
35 35 3456 Z 2016-01-01 00:35:00 00:01:00 0
36 36 3456 Z 2016-01-01 00:36:00 00:01:00 0
37 37 3456 Z 2016-01-01 00:37:00 00:01:00 0
38 38 3456 Z 2016-01-01 00:38:00 00:01:00 0
39 39 3456 Z 2016-01-01 00:39:00 00:01:00 0
40 40 3456 Z 2016-01-01 00:40:00 00:01:00 0
41 41 3456 Z 2016-01-01 00:41:00 00:01:00 0
42 42 3456 Z 2016-01-01 00:42:00 00:01:00 0
43 43 3456 Z 2016-01-01 00:43:00 00:01:00 0
44 44 3456 Z 2016-01-01 00:44:00 00:01:00 0
45 45 3456 Z 2016-01-01 00:45:00 00:01:00 0
46 46 3456 Z 2016-01-01 00:46:00 00:01:00 0
47 47 3456 Z 2016-01-01 00:47:00 00:01:00 0
48 48 3456 Z 2016-01-01 00:48:00 00:01:00 0
49 49 3456 Z 2016-01-01 00:49:00 00:01:00 0
50 50 3456 Z 2016-01-01 00:50:00 00:01:00 0
51 51 3456 Z 2016-01-01 00:51:00 00:01:00 0
52 52 3456 Z 2016-01-01 00:52:00 00:01:00 0
53 53 3456 Z 2016-01-01 00:53:00 00:01:00 0
54 54 3456 Z 2016-01-01 00:54:00 00:01:00 0
55 55 3456 Z 2016-01-01 00:55:00 00:01:00 0
56 56 3456 Z 2016-01-01 00:56:00 00:01:00 0
57 57 3456 Z 2016-01-01 00:57:00 00:01:00 0
58 58 3456 Z 2016-01-01 00:58:00 00:01:00 0
59 59 3456 Z 2016-01-01 00:59:00 00:01:00 0

请注意,('Y', 3456, 0) 组中实际上有两个轨道。所以这不是一个完整的解决方案。继续我尝试过这个

# In[228]:

# This method works. The difference is that when I calculated the
# delta_t I did it on the results of the groupby.

# There's a undesireable effect that the track counter doesn't reset with each new
# (site, sat) pair. It appears to keep counting up.

test2 = test.copy()
test2.sort_values(['site', 'sat', 'time'], inplace=True)
test2['delta_t'] = test2.groupby(['site', 'sat'])['time'].diff()
test2['track'] = (test2['delta_t'] > pd.Timedelta(minutes=20)).cumsum()

my_groups = test2.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
print(key)
print(g)

有输出

('X', 3456, 0)
data sat site time delta_t track
0 0 3456 X 2016-01-01 00:00:00 NaT 0
1 1 3456 X 2016-01-01 00:00:01 00:00:01 0
2 2 3456 X 2016-01-01 00:00:02 00:00:01 0
3 3 3456 X 2016-01-01 00:00:03 00:00:01 0
4 4 3456 X 2016-01-01 00:00:04 00:00:01 0
('X', 12345, 0)
data sat site time delta_t track
0 0 12345 X 2016-01-01 00:00:00 NaT 0
1 1 12345 X 2016-01-01 00:00:01 00:00:01 0
2 2 12345 X 2016-01-01 00:00:02 00:00:01 0
3 3 12345 X 2016-01-01 00:00:03 00:00:01 0
4 4 12345 X 2016-01-01 00:00:04 00:00:01 0
5 5 12345 X 2016-01-01 00:00:05 00:00:01 0
6 6 12345 X 2016-01-01 00:00:06 00:00:01 0
7 7 12345 X 2016-01-01 00:00:07 00:00:01 0
8 8 12345 X 2016-01-01 00:00:08 00:00:01 0
9 9 12345 X 2016-01-01 00:00:09 00:00:01 0
('Y', 3456, 0)
data sat site time delta_t track
0 0 3456 Y 2016-01-01 00:01:00 NaT 0
1 1 3456 Y 2016-01-01 00:01:01 00:00:01 0
2 2 3456 Y 2016-01-01 00:01:02 00:00:01 0
3 3 3456 Y 2016-01-01 00:01:03 00:00:01 0
4 4 3456 Y 2016-01-01 00:01:04 00:00:01 0
5 5 3456 Y 2016-01-01 00:01:05 00:00:01 0
6 6 3456 Y 2016-01-01 00:01:06 00:00:01 0
7 7 3456 Y 2016-01-01 00:01:07 00:00:01 0
8 8 3456 Y 2016-01-01 00:01:08 00:00:01 0
9 9 3456 Y 2016-01-01 00:01:09 00:00:01 0
('Y', 3456, 1)
data sat site time delta_t track
0 0 3456 Y 2016-01-01 01:00:00 00:58:51 1
1 1 3456 Y 2016-01-01 01:00:01 00:00:01 1
2 2 3456 Y 2016-01-01 01:00:02 00:00:01 1
3 3 3456 Y 2016-01-01 01:00:03 00:00:01 1
4 4 3456 Y 2016-01-01 01:00:04 00:00:01 1
('Y', 3456, 2)
data sat site time delta_t track
0 0 3456 Y 2016-01-02 00:00:00 22:59:56 2
1 1 3456 Y 2016-01-02 00:01:00 00:01:00 2
2 2 3456 Y 2016-01-02 00:02:00 00:01:00 2
3 3 3456 Y 2016-01-02 00:03:00 00:01:00 2
4 4 3456 Y 2016-01-02 00:04:00 00:01:00 2
('Z', 3456, 2)
data sat site time delta_t track
0 0 3456 Z 2016-01-01 00:00:00 NaT 2
1 1 3456 Z 2016-01-01 00:01:00 00:01:00 2
2 2 3456 Z 2016-01-01 00:02:00 00:01:00 2
3 3 3456 Z 2016-01-01 00:03:00 00:01:00 2
4 4 3456 Z 2016-01-01 00:04:00 00:01:00 2
5 5 3456 Z 2016-01-01 00:05:00 00:01:00 2
6 6 3456 Z 2016-01-01 00:06:00 00:01:00 2
7 7 3456 Z 2016-01-01 00:07:00 00:01:00 2
8 8 3456 Z 2016-01-01 00:08:00 00:01:00 2
9 9 3456 Z 2016-01-01 00:09:00 00:01:00 2
10 10 3456 Z 2016-01-01 00:10:00 00:01:00 2
11 11 3456 Z 2016-01-01 00:11:00 00:01:00 2
12 12 3456 Z 2016-01-01 00:12:00 00:01:00 2
13 13 3456 Z 2016-01-01 00:13:00 00:01:00 2
14 14 3456 Z 2016-01-01 00:14:00 00:01:00 2
15 15 3456 Z 2016-01-01 00:15:00 00:01:00 2
16 16 3456 Z 2016-01-01 00:16:00 00:01:00 2
17 17 3456 Z 2016-01-01 00:17:00 00:01:00 2
18 18 3456 Z 2016-01-01 00:18:00 00:01:00 2
19 19 3456 Z 2016-01-01 00:19:00 00:01:00 2
20 20 3456 Z 2016-01-01 00:20:00 00:01:00 2
21 21 3456 Z 2016-01-01 00:21:00 00:01:00 2
22 22 3456 Z 2016-01-01 00:22:00 00:01:00 2
23 23 3456 Z 2016-01-01 00:23:00 00:01:00 2
24 24 3456 Z 2016-01-01 00:24:00 00:01:00 2
25 25 3456 Z 2016-01-01 00:25:00 00:01:00 2
26 26 3456 Z 2016-01-01 00:26:00 00:01:00 2
27 27 3456 Z 2016-01-01 00:27:00 00:01:00 2
28 28 3456 Z 2016-01-01 00:28:00 00:01:00 2
29 29 3456 Z 2016-01-01 00:29:00 00:01:00 2
30 30 3456 Z 2016-01-01 00:30:00 00:01:00 2
31 31 3456 Z 2016-01-01 00:31:00 00:01:00 2
32 32 3456 Z 2016-01-01 00:32:00 00:01:00 2
33 33 3456 Z 2016-01-01 00:33:00 00:01:00 2
34 34 3456 Z 2016-01-01 00:34:00 00:01:00 2
35 35 3456 Z 2016-01-01 00:35:00 00:01:00 2
36 36 3456 Z 2016-01-01 00:36:00 00:01:00 2
37 37 3456 Z 2016-01-01 00:37:00 00:01:00 2
38 38 3456 Z 2016-01-01 00:38:00 00:01:00 2
39 39 3456 Z 2016-01-01 00:39:00 00:01:00 2
40 40 3456 Z 2016-01-01 00:40:00 00:01:00 2
41 41 3456 Z 2016-01-01 00:41:00 00:01:00 2
42 42 3456 Z 2016-01-01 00:42:00 00:01:00 2
43 43 3456 Z 2016-01-01 00:43:00 00:01:00 2
44 44 3456 Z 2016-01-01 00:44:00 00:01:00 2
45 45 3456 Z 2016-01-01 00:45:00 00:01:00 2
46 46 3456 Z 2016-01-01 00:46:00 00:01:00 2
47 47 3456 Z 2016-01-01 00:47:00 00:01:00 2
48 48 3456 Z 2016-01-01 00:48:00 00:01:00 2
49 49 3456 Z 2016-01-01 00:49:00 00:01:00 2
50 50 3456 Z 2016-01-01 00:50:00 00:01:00 2
51 51 3456 Z 2016-01-01 00:51:00 00:01:00 2
52 52 3456 Z 2016-01-01 00:52:00 00:01:00 2
53 53 3456 Z 2016-01-01 00:53:00 00:01:00 2
54 54 3456 Z 2016-01-01 00:54:00 00:01:00 2
55 55 3456 Z 2016-01-01 00:55:00 00:01:00 2
56 56 3456 Z 2016-01-01 00:56:00 00:01:00 2
57 57 3456 Z 2016-01-01 00:57:00 00:01:00 2
58 58 3456 Z 2016-01-01 00:58:00 00:01:00 2
59 59 3456 Z 2016-01-01 00:59:00 00:01:00 2

轨道分割是正确的,但如果轨道计数器在每个新的(站点、卫星)对时重置为 0,那就太好了。

# In[229]:

# This method works. The difference is that when I calculate the
# "track" counter I'm doing the cumulative sum on the results of
# the groupby. It also resets the track counter with each new
# site, sat group.

test3 = test.copy()

test3.sort_values(['site', 'sat', 'time'], inplace=True)
test3['delta_t'] = test3.groupby(['site', 'sat'])['time'].diff()

# calculate an intermediate flag column. If you try to eliminate this
# and put the boolean test directly into the 'track' calculation pandas
# will throw a recursion error.

test3['new_track'] = test3['delta_t'] > pd.Timedelta(minutes=20)

# The to_numeric call is used to convert from a float to an integer.

test3['track'] = pd.to_numeric(test3.groupby(['site', 'sat'])['new_track'].cumsum(), downcast='integer')

my_groups = test3.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
print(key)
print(g)

如下输出

('X', 3456, 0)
data sat site time delta_t new_track track
0 0 3456 X 2016-01-01 00:00:00 NaT False 0
1 1 3456 X 2016-01-01 00:00:01 00:00:01 False 0
2 2 3456 X 2016-01-01 00:00:02 00:00:01 False 0
3 3 3456 X 2016-01-01 00:00:03 00:00:01 False 0
4 4 3456 X 2016-01-01 00:00:04 00:00:01 False 0
('X', 12345, 0)
data sat site time delta_t new_track track
0 0 12345 X 2016-01-01 00:00:00 NaT False 0
1 1 12345 X 2016-01-01 00:00:01 00:00:01 False 0
2 2 12345 X 2016-01-01 00:00:02 00:00:01 False 0
3 3 12345 X 2016-01-01 00:00:03 00:00:01 False 0
4 4 12345 X 2016-01-01 00:00:04 00:00:01 False 0
5 5 12345 X 2016-01-01 00:00:05 00:00:01 False 0
6 6 12345 X 2016-01-01 00:00:06 00:00:01 False 0
7 7 12345 X 2016-01-01 00:00:07 00:00:01 False 0
8 8 12345 X 2016-01-01 00:00:08 00:00:01 False 0
9 9 12345 X 2016-01-01 00:00:09 00:00:01 False 0
('Y', 3456, 0)
data sat site time delta_t new_track track
0 0 3456 Y 2016-01-01 00:01:00 NaT False 0
1 1 3456 Y 2016-01-01 00:01:01 00:00:01 False 0
2 2 3456 Y 2016-01-01 00:01:02 00:00:01 False 0
3 3 3456 Y 2016-01-01 00:01:03 00:00:01 False 0
4 4 3456 Y 2016-01-01 00:01:04 00:00:01 False 0
5 5 3456 Y 2016-01-01 00:01:05 00:00:01 False 0
6 6 3456 Y 2016-01-01 00:01:06 00:00:01 False 0
7 7 3456 Y 2016-01-01 00:01:07 00:00:01 False 0
8 8 3456 Y 2016-01-01 00:01:08 00:00:01 False 0
9 9 3456 Y 2016-01-01 00:01:09 00:00:01 False 0
('Y', 3456, 1)
data sat site time delta_t new_track track
0 0 3456 Y 2016-01-01 01:00:00 00:58:51 True 1
1 1 3456 Y 2016-01-01 01:00:01 00:00:01 False 1
2 2 3456 Y 2016-01-01 01:00:02 00:00:01 False 1
3 3 3456 Y 2016-01-01 01:00:03 00:00:01 False 1
4 4 3456 Y 2016-01-01 01:00:04 00:00:01 False 1
('Y', 3456, 2)
data sat site time delta_t new_track track
0 0 3456 Y 2016-01-02 00:00:00 22:59:56 True 2
1 1 3456 Y 2016-01-02 00:01:00 00:01:00 False 2
2 2 3456 Y 2016-01-02 00:02:00 00:01:00 False 2
3 3 3456 Y 2016-01-02 00:03:00 00:01:00 False 2
4 4 3456 Y 2016-01-02 00:04:00 00:01:00 False 2
('Z', 3456, 0)
data sat site time delta_t new_track track
0 0 3456 Z 2016-01-01 00:00:00 NaT False 0
1 1 3456 Z 2016-01-01 00:01:00 00:01:00 False 0
2 2 3456 Z 2016-01-01 00:02:00 00:01:00 False 0
3 3 3456 Z 2016-01-01 00:03:00 00:01:00 False 0
4 4 3456 Z 2016-01-01 00:04:00 00:01:00 False 0
5 5 3456 Z 2016-01-01 00:05:00 00:01:00 False 0
6 6 3456 Z 2016-01-01 00:06:00 00:01:00 False 0
7 7 3456 Z 2016-01-01 00:07:00 00:01:00 False 0
8 8 3456 Z 2016-01-01 00:08:00 00:01:00 False 0
9 9 3456 Z 2016-01-01 00:09:00 00:01:00 False 0
10 10 3456 Z 2016-01-01 00:10:00 00:01:00 False 0
11 11 3456 Z 2016-01-01 00:11:00 00:01:00 False 0
12 12 3456 Z 2016-01-01 00:12:00 00:01:00 False 0
13 13 3456 Z 2016-01-01 00:13:00 00:01:00 False 0
14 14 3456 Z 2016-01-01 00:14:00 00:01:00 False 0
15 15 3456 Z 2016-01-01 00:15:00 00:01:00 False 0
16 16 3456 Z 2016-01-01 00:16:00 00:01:00 False 0
17 17 3456 Z 2016-01-01 00:17:00 00:01:00 False 0
18 18 3456 Z 2016-01-01 00:18:00 00:01:00 False 0
19 19 3456 Z 2016-01-01 00:19:00 00:01:00 False 0
20 20 3456 Z 2016-01-01 00:20:00 00:01:00 False 0
21 21 3456 Z 2016-01-01 00:21:00 00:01:00 False 0
22 22 3456 Z 2016-01-01 00:22:00 00:01:00 False 0
23 23 3456 Z 2016-01-01 00:23:00 00:01:00 False 0
24 24 3456 Z 2016-01-01 00:24:00 00:01:00 False 0
25 25 3456 Z 2016-01-01 00:25:00 00:01:00 False 0
26 26 3456 Z 2016-01-01 00:26:00 00:01:00 False 0
27 27 3456 Z 2016-01-01 00:27:00 00:01:00 False 0
28 28 3456 Z 2016-01-01 00:28:00 00:01:00 False 0
29 29 3456 Z 2016-01-01 00:29:00 00:01:00 False 0
30 30 3456 Z 2016-01-01 00:30:00 00:01:00 False 0
31 31 3456 Z 2016-01-01 00:31:00 00:01:00 False 0
32 32 3456 Z 2016-01-01 00:32:00 00:01:00 False 0
33 33 3456 Z 2016-01-01 00:33:00 00:01:00 False 0
34 34 3456 Z 2016-01-01 00:34:00 00:01:00 False 0
35 35 3456 Z 2016-01-01 00:35:00 00:01:00 False 0
36 36 3456 Z 2016-01-01 00:36:00 00:01:00 False 0
37 37 3456 Z 2016-01-01 00:37:00 00:01:00 False 0
38 38 3456 Z 2016-01-01 00:38:00 00:01:00 False 0
39 39 3456 Z 2016-01-01 00:39:00 00:01:00 False 0
40 40 3456 Z 2016-01-01 00:40:00 00:01:00 False 0
41 41 3456 Z 2016-01-01 00:41:00 00:01:00 False 0
42 42 3456 Z 2016-01-01 00:42:00 00:01:00 False 0
43 43 3456 Z 2016-01-01 00:43:00 00:01:00 False 0
44 44 3456 Z 2016-01-01 00:44:00 00:01:00 False 0
45 45 3456 Z 2016-01-01 00:45:00 00:01:00 False 0
46 46 3456 Z 2016-01-01 00:46:00 00:01:00 False 0
47 47 3456 Z 2016-01-01 00:47:00 00:01:00 False 0
48 48 3456 Z 2016-01-01 00:48:00 00:01:00 False 0
49 49 3456 Z 2016-01-01 00:49:00 00:01:00 False 0
50 50 3456 Z 2016-01-01 00:50:00 00:01:00 False 0
51 51 3456 Z 2016-01-01 00:51:00 00:01:00 False 0
52 52 3456 Z 2016-01-01 00:52:00 00:01:00 False 0
53 53 3456 Z 2016-01-01 00:53:00 00:01:00 False 0
54 54 3456 Z 2016-01-01 00:54:00 00:01:00 False 0
55 55 3456 Z 2016-01-01 00:55:00 00:01:00 False 0
56 56 3456 Z 2016-01-01 00:56:00 00:01:00 False 0
57 57 3456 Z 2016-01-01 00:57:00 00:01:00 False 0
58 58 3456 Z 2016-01-01 00:58:00 00:01:00 False 0
59 59 3456 Z 2016-01-01 00:59:00 00:01:00 False 0

所以我想我现在有一个很好的解决方案。奇怪的是,如果您尝试将 new_track 测试合并到 track 计算中,pandas 会抛出递归错误。

关于python - Pandas 按大于 N 分钟的时间增量进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41486948/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com