- android - 多次调用 OnPrimaryClipChangedListener
- android - 无法更新 RecyclerView 中的 TextView 字段
- android.database.CursorIndexOutOfBoundsException : Index 0 requested, 光标大小为 0
- android - 使用 AppCompat 时,我们是否需要明确指定其 UI 组件(Spinner、EditText)颜色
我有一个数据框,其中包含时间标签、卫星 ID 和站点 ID 的列。我的目标是将数据集分解为单独的“轨道”,其中每个“轨道”都是卫星和站点 ID 的唯一组合。我可以使用标准 pandas groupby 功能并指定 by=['site', 'sat']
轻松完成此操作。但进一步需要注意的是,如果一组内存在超过N分钟的时间间隙,那么该时间间隙之后的数据应该成为新的“轨迹”。
我的问题是计算(站点、卫星)组内连续行之间的时间增量、确定时间增量何时大于 N 分钟并创建新组/轨道的最佳方法是什么?
我想我可以使用 diff() 方法计算行之间的时间增量。理想情况下,有一种方法可以向我的 groupby 调用添加第三个键,该键封装了我正在使用的时间限制。
这里是一些示例代码,用于生成测试数据集并进行初始站点、sat 分组。
import pandas as pd
import numpy as np
# Create first sample set.
N=10
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
A_data = np.arange(0, N)
A_site = ['X'] * N
A_sat = 12345
# Create second sample set over the same time span but with a different sat
N=5
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
B_data = np.arange(0, N)
B_site = ['X'] * N
B_sat = 3456
# Create a third sample set with a new site over the same time span but the
# same sat as the second set
N = 10
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s')
C_data = np.arange(0, N)
C_site = ['Y'] * N
C_sat = 3456
# Create a fourth sample set with the same sat and site as the third set but
# more than 20 minutes after the third set.
N = 5
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s')
D_data = np.arange(0, N)
D_site = ['Y'] * N
D_sat = 3456
# Build a data frame for each sample set
A = pd.DataFrame(index=A_times, data={'data': A_data, 'site' : A_site, 'sat' : A_sat})
B = pd.DataFrame(index=B_times, data={'data': B_data, 'site' : B_site, 'sat' : B_sat})
C = pd.DataFrame(index=C_times, data={'data': C_data, 'site' : C_site, 'sat' : C_sat})
D = pd.DataFrame(index=D_times, data={'data': D_data, 'site' : D_site, 'sat' : D_sat})
# mash them into one larger test data frame
test = pd.concat([A, B, C, D])
print(test)
my_groups = test.groupby(by = ['site', 'sat'])
for key, g in my_groups:
print(key)
print(g)
这个的输出是
测试=
data sat site
2016-01-01 00:00:00 0 12345 X
2016-01-01 00:00:01 1 12345 X
2016-01-01 00:00:02 2 12345 X
2016-01-01 00:00:03 3 12345 X
2016-01-01 00:00:04 4 12345 X
2016-01-01 00:00:05 5 12345 X
2016-01-01 00:00:06 6 12345 X
2016-01-01 00:00:07 7 12345 X
2016-01-01 00:00:08 8 12345 X
2016-01-01 00:00:09 9 12345 X
2016-01-01 00:00:00 0 3456 X
2016-01-01 00:00:01 1 3456 X
2016-01-01 00:00:02 2 3456 X
2016-01-01 00:00:03 3 3456 X
2016-01-01 00:00:04 4 3456 X
2016-01-01 00:01:00 0 3456 Y
2016-01-01 00:01:01 1 3456 Y
2016-01-01 00:01:02 2 3456 Y
2016-01-01 00:01:03 3 3456 Y
2016-01-01 00:01:04 4 3456 Y
2016-01-01 00:01:05 5 3456 Y
2016-01-01 00:01:06 6 3456 Y
2016-01-01 00:01:07 7 3456 Y
2016-01-01 00:01:08 8 3456 Y
2016-01-01 00:01:09 9 3456 Y
2016-01-01 01:00:00 0 3456 Y
2016-01-01 01:00:01 1 3456 Y
2016-01-01 01:00:02 2 3456 Y
2016-01-01 01:00:03 3 3456 Y
2016-01-01 01:00:04 4 3456 Y
各个组是
('X', 3456)
data sat site
2016-01-01 00:00:00 0 3456 X
2016-01-01 00:00:01 1 3456 X
2016-01-01 00:00:02 2 3456 X
2016-01-01 00:00:03 3 3456 X
2016-01-01 00:00:04 4 3456 X
('X', 12345)
data sat site
2016-01-01 00:00:00 0 12345 X
2016-01-01 00:00:01 1 12345 X
2016-01-01 00:00:02 2 12345 X
2016-01-01 00:00:03 3 12345 X
2016-01-01 00:00:04 4 12345 X
2016-01-01 00:00:05 5 12345 X
2016-01-01 00:00:06 6 12345 X
2016-01-01 00:00:07 7 12345 X
2016-01-01 00:00:08 8 12345 X
2016-01-01 00:00:09 9 12345 X
('Y', 3456)
data sat site
2016-01-01 00:01:00 0 3456 Y
2016-01-01 00:01:01 1 3456 Y
2016-01-01 00:01:02 2 3456 Y
2016-01-01 00:01:03 3 3456 Y
2016-01-01 00:01:04 4 3456 Y
2016-01-01 00:01:05 5 3456 Y
2016-01-01 00:01:06 6 3456 Y
2016-01-01 00:01:07 7 3456 Y
2016-01-01 00:01:08 8 3456 Y
2016-01-01 00:01:09 9 3456 Y
2016-01-01 01:00:00 0 3456 Y
2016-01-01 01:00:01 1 3456 Y
2016-01-01 01:00:02 2 3456 Y
2016-01-01 01:00:03 3 3456 Y
2016-01-01 01:00:04 4 3456 Y
期望的行为是,由于数据中有 20 分钟的间隙,上面的第三组实际上应该分为两组,例如
('Y', 3456)
data sat site
2016-01-01 00:01:00 0 3456 Y
2016-01-01 00:01:01 1 3456 Y
2016-01-01 00:01:02 2 3456 Y
2016-01-01 00:01:03 3 3456 Y
2016-01-01 00:01:04 4 3456 Y
2016-01-01 00:01:05 5 3456 Y
2016-01-01 00:01:06 6 3456 Y
2016-01-01 00:01:07 7 3456 Y
2016-01-01 00:01:08 8 3456 Y
2016-01-01 00:01:09 9 3456 Y
New group here
2016-01-01 01:00:00 0 3456 Y
2016-01-01 01:00:01 1 3456 Y
2016-01-01 01:00:02 2 3456 Y
2016-01-01 01:00:03 3 3456 Y
2016-01-01 01:00:04 4 3456 Y
如有任何建议,我们将不胜感激。谢谢!
最佳答案
在摆弄 unutbu 的解决方案并将其与 this post 的建议结合起来之后,我能够解决这个问题。下面显示了更完整的示例测试集和解决方案。
# In[222]:
import pandas as pd
import numpy as np
# In[223]:
# Create 1st sample set.
N=10
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
A_data = np.arange(0, N)
A_site = ['X'] * N
A_sat = 12345
# Create 2nd sample set over the same time span but with a different sat
N=5
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
B_data = np.arange(0, N)
B_site = ['X'] * N
B_sat = 3456
# Create a 3rd sample set with a new site over the same time span but the
# same sat as the 2nd set
N = 10
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s')
C_data = np.arange(0, N)
C_site = ['Y'] * N
C_sat = 3456
# Create a 4th sample set with the same sat and site as the 3rd set but
# more than 20 minutes after the third set.
N = 5
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s')
D_data = np.arange(0, N)
D_site = ['Y'] * N
D_sat = 3456
# Create a 5th sample set with the same sat and site as the 3rd set but
# more than 20 minutes after the third set.
N = 60
E_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='60s')
E_data = np.arange(0, N)
E_site = ['Z'] * N
E_sat = 3456
# Create a 6th sample set with the same sat and site as the 4th set but
# more than 20 minutes after the fourth set.
N = 5
F_times = pd.date_range('2016-01-02T00:00:00', periods=N, freq='60s')
F_data = np.arange(0, N)
F_site = ['Y'] * N
F_sat = 3456
# In[224]:
# Build a data frame for each sample set
A = pd.DataFrame(data={'time': A_times, 'data': A_data, 'site' : A_site, 'sat' : A_sat})
B = pd.DataFrame(data={'time': B_times, 'data': B_data, 'site' : B_site, 'sat' : B_sat})
C = pd.DataFrame(data={'time': C_times, 'data': C_data, 'site' : C_site, 'sat' : C_sat})
D = pd.DataFrame(data={'time': D_times, 'data': D_data, 'site' : D_site, 'sat' : D_sat})
E = pd.DataFrame(data={'time': E_times, 'data': E_data, 'site' : E_site, 'sat' : E_sat})
F = pd.DataFrame(data={'time': F_times, 'data': F_data, 'site' : F_site, 'sat' : F_sat})
# mash them into one larger test data frame
test = pd.concat([A, B, C, D, E, F])
# In[225]:
print(test)
# In[226]:
test.sort_values(['time'], inplace=True)
# In[227]:
# This approach doesn't quite work. Note that the group (Y, 3456, 0) really
# has 2 tracks in it because the overlapping track from (Z, 3456, 0) is screwing up
# the delta-t calculation and hiding the fact that within the Y, 3456 group there
# was a large time gap.
test1 = test.copy()
test1['delta_t'] = test1['time'].diff()
test1['track'] = (test1['delta_t'] > pd.Timedelta(minutes=20)).cumsum()
my_groups = test1.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
print(key)
print(g)
其输出如下所示:
('X', 3456, 0)
data sat site time delta_t track
0 0 3456 X 2016-01-01 00:00:00 0 days 0
1 1 3456 X 2016-01-01 00:00:01 0 days 0
2 2 3456 X 2016-01-01 00:00:02 0 days 0
3 3 3456 X 2016-01-01 00:00:03 0 days 0
4 4 3456 X 2016-01-01 00:00:04 0 days 0
('X', 12345, 0)
data sat site time delta_t track
0 0 12345 X 2016-01-01 00:00:00 NaT 0
1 1 12345 X 2016-01-01 00:00:01 00:00:01 0
2 2 12345 X 2016-01-01 00:00:02 00:00:01 0
3 3 12345 X 2016-01-01 00:00:03 00:00:01 0
4 4 12345 X 2016-01-01 00:00:04 00:00:01 0
5 5 12345 X 2016-01-01 00:00:05 00:00:01 0
6 6 12345 X 2016-01-01 00:00:06 00:00:01 0
7 7 12345 X 2016-01-01 00:00:07 00:00:01 0
8 8 12345 X 2016-01-01 00:00:08 00:00:01 0
9 9 12345 X 2016-01-01 00:00:09 00:00:01 0
('Y', 3456, 0)
data sat site time delta_t track
0 0 3456 Y 2016-01-01 00:01:00 00:00:00 0
1 1 3456 Y 2016-01-01 00:01:01 00:00:01 0
2 2 3456 Y 2016-01-01 00:01:02 00:00:01 0
3 3 3456 Y 2016-01-01 00:01:03 00:00:01 0
4 4 3456 Y 2016-01-01 00:01:04 00:00:01 0
5 5 3456 Y 2016-01-01 00:01:05 00:00:01 0
6 6 3456 Y 2016-01-01 00:01:06 00:00:01 0
7 7 3456 Y 2016-01-01 00:01:07 00:00:01 0
8 8 3456 Y 2016-01-01 00:01:08 00:00:01 0
9 9 3456 Y 2016-01-01 00:01:09 00:00:01 0
0 0 3456 Y 2016-01-01 01:00:00 00:01:00 0
1 1 3456 Y 2016-01-01 01:00:01 00:00:01 0
2 2 3456 Y 2016-01-01 01:00:02 00:00:01 0
3 3 3456 Y 2016-01-01 01:00:03 00:00:01 0
4 4 3456 Y 2016-01-01 01:00:04 00:00:01 0
('Y', 3456, 1)
data sat site time delta_t track
0 0 3456 Y 2016-01-02 00:00:00 22:59:56 1
1 1 3456 Y 2016-01-02 00:01:00 00:01:00 1
2 2 3456 Y 2016-01-02 00:02:00 00:01:00 1
3 3 3456 Y 2016-01-02 00:03:00 00:01:00 1
4 4 3456 Y 2016-01-02 00:04:00 00:01:00 1
('Z', 3456, 0)
data sat site time delta_t track
0 0 3456 Z 2016-01-01 00:00:00 00:00:00 0
1 1 3456 Z 2016-01-01 00:01:00 00:00:51 0
2 2 3456 Z 2016-01-01 00:02:00 00:00:51 0
3 3 3456 Z 2016-01-01 00:03:00 00:01:00 0
4 4 3456 Z 2016-01-01 00:04:00 00:01:00 0
5 5 3456 Z 2016-01-01 00:05:00 00:01:00 0
6 6 3456 Z 2016-01-01 00:06:00 00:01:00 0
7 7 3456 Z 2016-01-01 00:07:00 00:01:00 0
8 8 3456 Z 2016-01-01 00:08:00 00:01:00 0
9 9 3456 Z 2016-01-01 00:09:00 00:01:00 0
10 10 3456 Z 2016-01-01 00:10:00 00:01:00 0
11 11 3456 Z 2016-01-01 00:11:00 00:01:00 0
12 12 3456 Z 2016-01-01 00:12:00 00:01:00 0
13 13 3456 Z 2016-01-01 00:13:00 00:01:00 0
14 14 3456 Z 2016-01-01 00:14:00 00:01:00 0
15 15 3456 Z 2016-01-01 00:15:00 00:01:00 0
16 16 3456 Z 2016-01-01 00:16:00 00:01:00 0
17 17 3456 Z 2016-01-01 00:17:00 00:01:00 0
18 18 3456 Z 2016-01-01 00:18:00 00:01:00 0
19 19 3456 Z 2016-01-01 00:19:00 00:01:00 0
20 20 3456 Z 2016-01-01 00:20:00 00:01:00 0
21 21 3456 Z 2016-01-01 00:21:00 00:01:00 0
22 22 3456 Z 2016-01-01 00:22:00 00:01:00 0
23 23 3456 Z 2016-01-01 00:23:00 00:01:00 0
24 24 3456 Z 2016-01-01 00:24:00 00:01:00 0
25 25 3456 Z 2016-01-01 00:25:00 00:01:00 0
26 26 3456 Z 2016-01-01 00:26:00 00:01:00 0
27 27 3456 Z 2016-01-01 00:27:00 00:01:00 0
28 28 3456 Z 2016-01-01 00:28:00 00:01:00 0
29 29 3456 Z 2016-01-01 00:29:00 00:01:00 0
30 30 3456 Z 2016-01-01 00:30:00 00:01:00 0
31 31 3456 Z 2016-01-01 00:31:00 00:01:00 0
32 32 3456 Z 2016-01-01 00:32:00 00:01:00 0
33 33 3456 Z 2016-01-01 00:33:00 00:01:00 0
34 34 3456 Z 2016-01-01 00:34:00 00:01:00 0
35 35 3456 Z 2016-01-01 00:35:00 00:01:00 0
36 36 3456 Z 2016-01-01 00:36:00 00:01:00 0
37 37 3456 Z 2016-01-01 00:37:00 00:01:00 0
38 38 3456 Z 2016-01-01 00:38:00 00:01:00 0
39 39 3456 Z 2016-01-01 00:39:00 00:01:00 0
40 40 3456 Z 2016-01-01 00:40:00 00:01:00 0
41 41 3456 Z 2016-01-01 00:41:00 00:01:00 0
42 42 3456 Z 2016-01-01 00:42:00 00:01:00 0
43 43 3456 Z 2016-01-01 00:43:00 00:01:00 0
44 44 3456 Z 2016-01-01 00:44:00 00:01:00 0
45 45 3456 Z 2016-01-01 00:45:00 00:01:00 0
46 46 3456 Z 2016-01-01 00:46:00 00:01:00 0
47 47 3456 Z 2016-01-01 00:47:00 00:01:00 0
48 48 3456 Z 2016-01-01 00:48:00 00:01:00 0
49 49 3456 Z 2016-01-01 00:49:00 00:01:00 0
50 50 3456 Z 2016-01-01 00:50:00 00:01:00 0
51 51 3456 Z 2016-01-01 00:51:00 00:01:00 0
52 52 3456 Z 2016-01-01 00:52:00 00:01:00 0
53 53 3456 Z 2016-01-01 00:53:00 00:01:00 0
54 54 3456 Z 2016-01-01 00:54:00 00:01:00 0
55 55 3456 Z 2016-01-01 00:55:00 00:01:00 0
56 56 3456 Z 2016-01-01 00:56:00 00:01:00 0
57 57 3456 Z 2016-01-01 00:57:00 00:01:00 0
58 58 3456 Z 2016-01-01 00:58:00 00:01:00 0
59 59 3456 Z 2016-01-01 00:59:00 00:01:00 0
请注意,('Y', 3456, 0) 组中实际上有两个轨道。所以这不是一个完整的解决方案。继续我尝试过这个
# In[228]:
# This method works. The difference is that when I calculated the
# delta_t I did it on the results of the groupby.
# There's a undesireable effect that the track counter doesn't reset with each new
# (site, sat) pair. It appears to keep counting up.
test2 = test.copy()
test2.sort_values(['site', 'sat', 'time'], inplace=True)
test2['delta_t'] = test2.groupby(['site', 'sat'])['time'].diff()
test2['track'] = (test2['delta_t'] > pd.Timedelta(minutes=20)).cumsum()
my_groups = test2.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
print(key)
print(g)
有输出
('X', 3456, 0)
data sat site time delta_t track
0 0 3456 X 2016-01-01 00:00:00 NaT 0
1 1 3456 X 2016-01-01 00:00:01 00:00:01 0
2 2 3456 X 2016-01-01 00:00:02 00:00:01 0
3 3 3456 X 2016-01-01 00:00:03 00:00:01 0
4 4 3456 X 2016-01-01 00:00:04 00:00:01 0
('X', 12345, 0)
data sat site time delta_t track
0 0 12345 X 2016-01-01 00:00:00 NaT 0
1 1 12345 X 2016-01-01 00:00:01 00:00:01 0
2 2 12345 X 2016-01-01 00:00:02 00:00:01 0
3 3 12345 X 2016-01-01 00:00:03 00:00:01 0
4 4 12345 X 2016-01-01 00:00:04 00:00:01 0
5 5 12345 X 2016-01-01 00:00:05 00:00:01 0
6 6 12345 X 2016-01-01 00:00:06 00:00:01 0
7 7 12345 X 2016-01-01 00:00:07 00:00:01 0
8 8 12345 X 2016-01-01 00:00:08 00:00:01 0
9 9 12345 X 2016-01-01 00:00:09 00:00:01 0
('Y', 3456, 0)
data sat site time delta_t track
0 0 3456 Y 2016-01-01 00:01:00 NaT 0
1 1 3456 Y 2016-01-01 00:01:01 00:00:01 0
2 2 3456 Y 2016-01-01 00:01:02 00:00:01 0
3 3 3456 Y 2016-01-01 00:01:03 00:00:01 0
4 4 3456 Y 2016-01-01 00:01:04 00:00:01 0
5 5 3456 Y 2016-01-01 00:01:05 00:00:01 0
6 6 3456 Y 2016-01-01 00:01:06 00:00:01 0
7 7 3456 Y 2016-01-01 00:01:07 00:00:01 0
8 8 3456 Y 2016-01-01 00:01:08 00:00:01 0
9 9 3456 Y 2016-01-01 00:01:09 00:00:01 0
('Y', 3456, 1)
data sat site time delta_t track
0 0 3456 Y 2016-01-01 01:00:00 00:58:51 1
1 1 3456 Y 2016-01-01 01:00:01 00:00:01 1
2 2 3456 Y 2016-01-01 01:00:02 00:00:01 1
3 3 3456 Y 2016-01-01 01:00:03 00:00:01 1
4 4 3456 Y 2016-01-01 01:00:04 00:00:01 1
('Y', 3456, 2)
data sat site time delta_t track
0 0 3456 Y 2016-01-02 00:00:00 22:59:56 2
1 1 3456 Y 2016-01-02 00:01:00 00:01:00 2
2 2 3456 Y 2016-01-02 00:02:00 00:01:00 2
3 3 3456 Y 2016-01-02 00:03:00 00:01:00 2
4 4 3456 Y 2016-01-02 00:04:00 00:01:00 2
('Z', 3456, 2)
data sat site time delta_t track
0 0 3456 Z 2016-01-01 00:00:00 NaT 2
1 1 3456 Z 2016-01-01 00:01:00 00:01:00 2
2 2 3456 Z 2016-01-01 00:02:00 00:01:00 2
3 3 3456 Z 2016-01-01 00:03:00 00:01:00 2
4 4 3456 Z 2016-01-01 00:04:00 00:01:00 2
5 5 3456 Z 2016-01-01 00:05:00 00:01:00 2
6 6 3456 Z 2016-01-01 00:06:00 00:01:00 2
7 7 3456 Z 2016-01-01 00:07:00 00:01:00 2
8 8 3456 Z 2016-01-01 00:08:00 00:01:00 2
9 9 3456 Z 2016-01-01 00:09:00 00:01:00 2
10 10 3456 Z 2016-01-01 00:10:00 00:01:00 2
11 11 3456 Z 2016-01-01 00:11:00 00:01:00 2
12 12 3456 Z 2016-01-01 00:12:00 00:01:00 2
13 13 3456 Z 2016-01-01 00:13:00 00:01:00 2
14 14 3456 Z 2016-01-01 00:14:00 00:01:00 2
15 15 3456 Z 2016-01-01 00:15:00 00:01:00 2
16 16 3456 Z 2016-01-01 00:16:00 00:01:00 2
17 17 3456 Z 2016-01-01 00:17:00 00:01:00 2
18 18 3456 Z 2016-01-01 00:18:00 00:01:00 2
19 19 3456 Z 2016-01-01 00:19:00 00:01:00 2
20 20 3456 Z 2016-01-01 00:20:00 00:01:00 2
21 21 3456 Z 2016-01-01 00:21:00 00:01:00 2
22 22 3456 Z 2016-01-01 00:22:00 00:01:00 2
23 23 3456 Z 2016-01-01 00:23:00 00:01:00 2
24 24 3456 Z 2016-01-01 00:24:00 00:01:00 2
25 25 3456 Z 2016-01-01 00:25:00 00:01:00 2
26 26 3456 Z 2016-01-01 00:26:00 00:01:00 2
27 27 3456 Z 2016-01-01 00:27:00 00:01:00 2
28 28 3456 Z 2016-01-01 00:28:00 00:01:00 2
29 29 3456 Z 2016-01-01 00:29:00 00:01:00 2
30 30 3456 Z 2016-01-01 00:30:00 00:01:00 2
31 31 3456 Z 2016-01-01 00:31:00 00:01:00 2
32 32 3456 Z 2016-01-01 00:32:00 00:01:00 2
33 33 3456 Z 2016-01-01 00:33:00 00:01:00 2
34 34 3456 Z 2016-01-01 00:34:00 00:01:00 2
35 35 3456 Z 2016-01-01 00:35:00 00:01:00 2
36 36 3456 Z 2016-01-01 00:36:00 00:01:00 2
37 37 3456 Z 2016-01-01 00:37:00 00:01:00 2
38 38 3456 Z 2016-01-01 00:38:00 00:01:00 2
39 39 3456 Z 2016-01-01 00:39:00 00:01:00 2
40 40 3456 Z 2016-01-01 00:40:00 00:01:00 2
41 41 3456 Z 2016-01-01 00:41:00 00:01:00 2
42 42 3456 Z 2016-01-01 00:42:00 00:01:00 2
43 43 3456 Z 2016-01-01 00:43:00 00:01:00 2
44 44 3456 Z 2016-01-01 00:44:00 00:01:00 2
45 45 3456 Z 2016-01-01 00:45:00 00:01:00 2
46 46 3456 Z 2016-01-01 00:46:00 00:01:00 2
47 47 3456 Z 2016-01-01 00:47:00 00:01:00 2
48 48 3456 Z 2016-01-01 00:48:00 00:01:00 2
49 49 3456 Z 2016-01-01 00:49:00 00:01:00 2
50 50 3456 Z 2016-01-01 00:50:00 00:01:00 2
51 51 3456 Z 2016-01-01 00:51:00 00:01:00 2
52 52 3456 Z 2016-01-01 00:52:00 00:01:00 2
53 53 3456 Z 2016-01-01 00:53:00 00:01:00 2
54 54 3456 Z 2016-01-01 00:54:00 00:01:00 2
55 55 3456 Z 2016-01-01 00:55:00 00:01:00 2
56 56 3456 Z 2016-01-01 00:56:00 00:01:00 2
57 57 3456 Z 2016-01-01 00:57:00 00:01:00 2
58 58 3456 Z 2016-01-01 00:58:00 00:01:00 2
59 59 3456 Z 2016-01-01 00:59:00 00:01:00 2
轨道分割是正确的,但如果轨道计数器在每个新的(站点、卫星)对时重置为 0,那就太好了。
# In[229]:
# This method works. The difference is that when I calculate the
# "track" counter I'm doing the cumulative sum on the results of
# the groupby. It also resets the track counter with each new
# site, sat group.
test3 = test.copy()
test3.sort_values(['site', 'sat', 'time'], inplace=True)
test3['delta_t'] = test3.groupby(['site', 'sat'])['time'].diff()
# calculate an intermediate flag column. If you try to eliminate this
# and put the boolean test directly into the 'track' calculation pandas
# will throw a recursion error.
test3['new_track'] = test3['delta_t'] > pd.Timedelta(minutes=20)
# The to_numeric call is used to convert from a float to an integer.
test3['track'] = pd.to_numeric(test3.groupby(['site', 'sat'])['new_track'].cumsum(), downcast='integer')
my_groups = test3.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
print(key)
print(g)
如下输出
('X', 3456, 0)
data sat site time delta_t new_track track
0 0 3456 X 2016-01-01 00:00:00 NaT False 0
1 1 3456 X 2016-01-01 00:00:01 00:00:01 False 0
2 2 3456 X 2016-01-01 00:00:02 00:00:01 False 0
3 3 3456 X 2016-01-01 00:00:03 00:00:01 False 0
4 4 3456 X 2016-01-01 00:00:04 00:00:01 False 0
('X', 12345, 0)
data sat site time delta_t new_track track
0 0 12345 X 2016-01-01 00:00:00 NaT False 0
1 1 12345 X 2016-01-01 00:00:01 00:00:01 False 0
2 2 12345 X 2016-01-01 00:00:02 00:00:01 False 0
3 3 12345 X 2016-01-01 00:00:03 00:00:01 False 0
4 4 12345 X 2016-01-01 00:00:04 00:00:01 False 0
5 5 12345 X 2016-01-01 00:00:05 00:00:01 False 0
6 6 12345 X 2016-01-01 00:00:06 00:00:01 False 0
7 7 12345 X 2016-01-01 00:00:07 00:00:01 False 0
8 8 12345 X 2016-01-01 00:00:08 00:00:01 False 0
9 9 12345 X 2016-01-01 00:00:09 00:00:01 False 0
('Y', 3456, 0)
data sat site time delta_t new_track track
0 0 3456 Y 2016-01-01 00:01:00 NaT False 0
1 1 3456 Y 2016-01-01 00:01:01 00:00:01 False 0
2 2 3456 Y 2016-01-01 00:01:02 00:00:01 False 0
3 3 3456 Y 2016-01-01 00:01:03 00:00:01 False 0
4 4 3456 Y 2016-01-01 00:01:04 00:00:01 False 0
5 5 3456 Y 2016-01-01 00:01:05 00:00:01 False 0
6 6 3456 Y 2016-01-01 00:01:06 00:00:01 False 0
7 7 3456 Y 2016-01-01 00:01:07 00:00:01 False 0
8 8 3456 Y 2016-01-01 00:01:08 00:00:01 False 0
9 9 3456 Y 2016-01-01 00:01:09 00:00:01 False 0
('Y', 3456, 1)
data sat site time delta_t new_track track
0 0 3456 Y 2016-01-01 01:00:00 00:58:51 True 1
1 1 3456 Y 2016-01-01 01:00:01 00:00:01 False 1
2 2 3456 Y 2016-01-01 01:00:02 00:00:01 False 1
3 3 3456 Y 2016-01-01 01:00:03 00:00:01 False 1
4 4 3456 Y 2016-01-01 01:00:04 00:00:01 False 1
('Y', 3456, 2)
data sat site time delta_t new_track track
0 0 3456 Y 2016-01-02 00:00:00 22:59:56 True 2
1 1 3456 Y 2016-01-02 00:01:00 00:01:00 False 2
2 2 3456 Y 2016-01-02 00:02:00 00:01:00 False 2
3 3 3456 Y 2016-01-02 00:03:00 00:01:00 False 2
4 4 3456 Y 2016-01-02 00:04:00 00:01:00 False 2
('Z', 3456, 0)
data sat site time delta_t new_track track
0 0 3456 Z 2016-01-01 00:00:00 NaT False 0
1 1 3456 Z 2016-01-01 00:01:00 00:01:00 False 0
2 2 3456 Z 2016-01-01 00:02:00 00:01:00 False 0
3 3 3456 Z 2016-01-01 00:03:00 00:01:00 False 0
4 4 3456 Z 2016-01-01 00:04:00 00:01:00 False 0
5 5 3456 Z 2016-01-01 00:05:00 00:01:00 False 0
6 6 3456 Z 2016-01-01 00:06:00 00:01:00 False 0
7 7 3456 Z 2016-01-01 00:07:00 00:01:00 False 0
8 8 3456 Z 2016-01-01 00:08:00 00:01:00 False 0
9 9 3456 Z 2016-01-01 00:09:00 00:01:00 False 0
10 10 3456 Z 2016-01-01 00:10:00 00:01:00 False 0
11 11 3456 Z 2016-01-01 00:11:00 00:01:00 False 0
12 12 3456 Z 2016-01-01 00:12:00 00:01:00 False 0
13 13 3456 Z 2016-01-01 00:13:00 00:01:00 False 0
14 14 3456 Z 2016-01-01 00:14:00 00:01:00 False 0
15 15 3456 Z 2016-01-01 00:15:00 00:01:00 False 0
16 16 3456 Z 2016-01-01 00:16:00 00:01:00 False 0
17 17 3456 Z 2016-01-01 00:17:00 00:01:00 False 0
18 18 3456 Z 2016-01-01 00:18:00 00:01:00 False 0
19 19 3456 Z 2016-01-01 00:19:00 00:01:00 False 0
20 20 3456 Z 2016-01-01 00:20:00 00:01:00 False 0
21 21 3456 Z 2016-01-01 00:21:00 00:01:00 False 0
22 22 3456 Z 2016-01-01 00:22:00 00:01:00 False 0
23 23 3456 Z 2016-01-01 00:23:00 00:01:00 False 0
24 24 3456 Z 2016-01-01 00:24:00 00:01:00 False 0
25 25 3456 Z 2016-01-01 00:25:00 00:01:00 False 0
26 26 3456 Z 2016-01-01 00:26:00 00:01:00 False 0
27 27 3456 Z 2016-01-01 00:27:00 00:01:00 False 0
28 28 3456 Z 2016-01-01 00:28:00 00:01:00 False 0
29 29 3456 Z 2016-01-01 00:29:00 00:01:00 False 0
30 30 3456 Z 2016-01-01 00:30:00 00:01:00 False 0
31 31 3456 Z 2016-01-01 00:31:00 00:01:00 False 0
32 32 3456 Z 2016-01-01 00:32:00 00:01:00 False 0
33 33 3456 Z 2016-01-01 00:33:00 00:01:00 False 0
34 34 3456 Z 2016-01-01 00:34:00 00:01:00 False 0
35 35 3456 Z 2016-01-01 00:35:00 00:01:00 False 0
36 36 3456 Z 2016-01-01 00:36:00 00:01:00 False 0
37 37 3456 Z 2016-01-01 00:37:00 00:01:00 False 0
38 38 3456 Z 2016-01-01 00:38:00 00:01:00 False 0
39 39 3456 Z 2016-01-01 00:39:00 00:01:00 False 0
40 40 3456 Z 2016-01-01 00:40:00 00:01:00 False 0
41 41 3456 Z 2016-01-01 00:41:00 00:01:00 False 0
42 42 3456 Z 2016-01-01 00:42:00 00:01:00 False 0
43 43 3456 Z 2016-01-01 00:43:00 00:01:00 False 0
44 44 3456 Z 2016-01-01 00:44:00 00:01:00 False 0
45 45 3456 Z 2016-01-01 00:45:00 00:01:00 False 0
46 46 3456 Z 2016-01-01 00:46:00 00:01:00 False 0
47 47 3456 Z 2016-01-01 00:47:00 00:01:00 False 0
48 48 3456 Z 2016-01-01 00:48:00 00:01:00 False 0
49 49 3456 Z 2016-01-01 00:49:00 00:01:00 False 0
50 50 3456 Z 2016-01-01 00:50:00 00:01:00 False 0
51 51 3456 Z 2016-01-01 00:51:00 00:01:00 False 0
52 52 3456 Z 2016-01-01 00:52:00 00:01:00 False 0
53 53 3456 Z 2016-01-01 00:53:00 00:01:00 False 0
54 54 3456 Z 2016-01-01 00:54:00 00:01:00 False 0
55 55 3456 Z 2016-01-01 00:55:00 00:01:00 False 0
56 56 3456 Z 2016-01-01 00:56:00 00:01:00 False 0
57 57 3456 Z 2016-01-01 00:57:00 00:01:00 False 0
58 58 3456 Z 2016-01-01 00:58:00 00:01:00 False 0
59 59 3456 Z 2016-01-01 00:59:00 00:01:00 False 0
所以我想我现在有一个很好的解决方案。奇怪的是,如果您尝试将 new_track
测试合并到 track
计算中,pandas 会抛出递归错误。
关于python - Pandas 按大于 N 分钟的时间增量进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41486948/
您好,我正在处理 BIRT 报告。我有一个查询,我必须对父级的重复数据进行分组,但子级也不能分组! 在我的查询中: item 是父项,item_ledger_entry 是子项。我有来自 item.N
我正在使用 GA API。 这是针对 MCF 目标报告(底部)的标准目标完成指标表(顶部) 看一下这个: 总数加起来 (12,238),但看看按 channel 分组的分割有多么不同!我以为这些会很接
我正在开发一个流量计数器,我想获得 IP 和重复计数,但是如何? 就像是 :select ip, count(ip) from Redirect 返回 : null total ip count 重定
我尝试编写一个正则表达式来匹配条件表达式,例如: a!=2 1+2=2+a 我尝试提取运算符。我当前的正则表达式是“.+([!=<>]+).+” 但问题是匹配器总是尝试匹配组中可能的最短字符串
在 MS Transact SQL 中,假设我有一个这样的表(订单): Order Date Order Total Customer # 09/30/2008 8
我想按 m.ID 分组,并对每个 m.id 求和 (pm.amount_construction* prod.anzahl) 实际上我有以下结果: Meterial_id | amount_const
我想根据多列中的值对值进行分组。这是一个例子: 我想得到输出: {{-30,-50,20},{-20,30,60},{-30,NULL or other value, 20}} 我设法到达: SELE
我正在尝试找出运行此查询的最佳方式。我基本上需要返回在我们的系统中只下了一个订单的客户的“登录”字段列表(登录字段基本上是客户 ID/ key )。 我们系统的一些背景...... 客户在同一日期下的
给定以下mysql结果集: id code name importance '1234', 'ID-CS-B', 'Chocolate Sauce'
大家好,我的数据框中有以下列: LC_REF 1 DT 16 2C 2 DT 16 2C 3 DT 16 2C 1 DT 16 3C 6 DT 16 3C 3
我有这样的 mongoDB 集合 { "_id" : "EkKTRrpH4FY9AuRLj", "stage" : 10, }, { "_id" : "EkKTRrpH4FY9
假设我有一组数据对,其中 index 0 是值,index 1 是类型: input = [ ('11013331', 'KAT'), ('9085267',
java中用stream进行去重,排序,分组 一、distinct 1. 八大基本数据类型 List collect = ListUtil.of(1, 2, 3, 1, 2).stream().fil
基本上,我从 TABLE_A 中的这个开始 France - 100 France - 200 France - 300 Mexico - 50 Mexico - 50 Mexico - 56 Pol
我希望这个正则表达式 ([A-Z]+)$ 将选择此示例中的最后一次出现: AB.012.00.022ABC-1 AB.013.00.022AB-1 AB.014.00.022ABAB-1 但我没有匹配
我创建了一个数据透视表,但数据没有组合在一起。 任何人都可以帮助我获得所需的格式吗? 我为获取数据透视表而编写的查询: DECLARE @cols AS NVARCHAR(MAX), -- f
我想按时间段(月,周,日,小时,...)选择计数和分组。例如,我想选择行数并将它们按 24 小时分组。 我的表创建如下。日期是时间戳。 CREATE TABLE MSG ( MSG_ID dec
在 SQL Server 2005 中,我有一个包含如下数据的表: WTN------------Date 555-111-1212 2009-01-01 555-111-1212 2009-
题 假设我有 k 个标量列,如果它们沿着每列彼此在一定距离内,我想对它们进行分组。 假设简单 k 是 2 并且它们是我唯一的列。 pd.DataFrame(list(zip(sorted(choice
问题 在以下数据框中 df : import random import pandas as pd random.seed(999) sz = 50 qty = {'one': 1, 'two': 2
我是一名优秀的程序员,十分优秀!