gpt4 book ai didi

python - Pandas:如何从 DatetimeIndex 中提取日期时间范围

转载 作者:太空宇宙 更新时间:2023-11-04 05:10:12 25 4
gpt4 key购买 nike

我有一个 DatetimeIndex 对象的集合,例如

DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00',
'2008-02-01 00:00:00', '2008-03-01 00:00:00',
'2008-04-01 00:00:00', '2012-09-01 00:10:00',
'2012-09-01 00:20:00', '2012-09-01 00:30:00',
'2012-09-01 00:40:00', '2012-09-01 00:50:00',
...
'2012-09-30 22:40:00', '2012-09-30 22:50:00',
'2012-09-30 23:00:00', '2012-09-30 23:10:00',
'2012-09-30 23:20:00', '2012-09-30 23:30:00',
'2012-09-30 23:40:00', '2012-09-30 23:50:00',
'2012-10-01 00:00:00', '2015-07-01 00:00:00'],
dtype='datetime64[ns]', length=4326, freq=None, tz=None)

它的freqinferred_freq都是None,我想是因为即使数据实际上有10分钟的周期,这也不能由于缺少零件而被检测到。我想尽可能高效地提取这些缺失的部分,或者说可用的部分。也就是说,我想得到诸如以下范围列表之类的东西:

[('2007-11-01 00:00:00', '2007-11-01 00:00:00'),
('2008-01-01 00:00:00', '2008-01-01 00:00:00'),
('2008-02-01 00:00:00', '2008-02-01 00:00:00'),
('2008-03-01 00:00:00', '2008-03-01 00:00:00'),
('2008-04-01 00:00:00', '2008-04-01 00:00:00'),
('2012-09-01 00:10:00', '2012-10-01 00:00:00'),
('2015-07-01 00:00:00', '2015-07-01 00:00:00')]

我应该怎么做呢?我查看了 PeriodIndex,但这似乎适用于不同类型的应用程序,或者可能只是尚未处理任意时间间隔。

最佳答案

我想你可以使用 groupby按系列 grouper 聚合 minmax:

grouper 是通过将difference10 分钟cumsum 进行比较而创建的。

rng = pd.DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00',
'2008-02-01 00:00:00', '2008-03-01 00:00:00',
'2008-04-01 00:00:00', '2012-09-01 00:10:00',
'2012-09-01 00:20:00', '2012-09-01 00:30:00',
'2012-09-01 00:40:00', '2012-09-01 00:50:00',
'2012-09-30 22:40:00', '2012-09-30 22:50:00',
'2012-09-30 23:00:00', '2012-09-30 23:10:00',
'2012-09-30 23:20:00', '2012-09-30 23:30:00',
'2012-09-30 23:40:00', '2012-09-30 23:50:00',
'2012-10-01 00:00:00', '2015-07-01 00:00:00'])

s = pd.Series(rng)
grouper = s.diff().ne(pd.to_timedelta('10min')).cumsum()
print (grouper)
0 1
1 2
2 3
3 4
4 5
5 6
6 6
7 6
8 6
9 6
10 7
11 7
12 8
13 8
14 8
15 8
16 8
17 8
18 8
19 9
dtype: int32
print (s.groupby(grouper).agg(['min', 'max']).astype(str).apply(tuple, axis=1).tolist())
[('2007-11-01 00:00:00', '2007-11-01 00:00:00'),
('2008-01-01 00:00:00', '2008-01-01 00:00:00'),
('2008-02-01 00:00:00', '2008-02-01 00:00:00'),
('2008-03-01 00:00:00', '2008-03-01 00:00:00'),
('2008-04-01 00:00:00', '2008-04-01 00:00:00'),
('2012-09-01 00:10:00', '2012-09-01 00:50:00'),
('2015-09-30 22:40:00', '2015-09-30 22:50:00'),
('2012-09-30 23:00:00', '2012-10-01 00:00:00'),
('2015-07-01 00:00:00', '2015-07-01 00:00:00')]

关于python - Pandas:如何从 DatetimeIndex 中提取日期时间范围,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43180419/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com