我有一个带有每月索引的 DataFrame。我想检查时间索引是否在每月频率上连续,如果可能的话,检查它变得不连续的地方,例如在其索引中相邻的两个月之间有某些“间隔月”。
例子:如下时间序列数据
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
错过 1964/10、1965/[1,2,7]。
使用asfreq
按月添加缺失的日期时间,将其过滤到新的 Series
并在必要时按年分组并创建月份列表:
s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-31
1 1965-01-31
2 1965-02-28
3 1965-07-31
Name: 0, dtype: datetime64[ns]
out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964 [10]
1965 [1, 2, 7]
Name: 0, dtype: object
设置:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-30 00:00:00'): 101.21,
pd.Timestamp('1964-11-30 00:00:00'): 101.42,
pd.Timestamp('1964-12-31 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
编辑:
如果日期时间不总是月份的最后一天:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-01 00:00:00'): 101.21,
pd.Timestamp('1964-11-02 00:00:00'): 101.42,
pd.Timestamp('1964-12-05 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-01 101.21
1964-11-02 101.42
1964-12-05 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-01
1 1965-01-01
2 1965-02-01
3 1965-07-01
dtype: datetime64[ns]
我是一名优秀的程序员,十分优秀!