gpt4 book ai didi

python - 描述时间序列 Pandas 中的差距

转载 作者:行者123 更新时间:2023-11-28 17:40:39 24 4
gpt4 key购买 nike

我正在尝试编写一个函数,该函数采用连续时间序列并返回一个数据结构,该数据结构描述了数据中任何缺失的间隙(例如,具有“开始”和“结束”列的 DF)。对于时间序列来说,这似乎是一个相当普遍的问题,但尽管弄乱了 groupby、diff 等——并探索了 SO——我还没有想出比下面更好的东西。

我的首要任务是使用矢量化操作来保持高效。必须有一个使用矢量化操作的更明显的解决方案——不是吗?感谢大家的帮助。

import pandas as pd


def get_gaps(series):
"""
@param series: a continuous time series of data with the index's freq set
@return: a series where the index is the start of gaps, and the values are
the ends
"""
missing = series.isnull()
different_from_last = missing.diff()

# any row not missing while the last was is a gap end
gap_ends = series[~missing & different_from_last].index

# count the start as different from the last
different_from_last[0] = True

# any row missing while the last wasn't is a gap start
gap_starts = series[missing & different_from_last].index

# check and remedy if series ends with missing data
if len(gap_starts) > len(gap_ends):
gap_ends = gap_ends.append(series.index[-1:] + series.index.freq)

return pd.Series(index=gap_starts, data=gap_ends)

根据记录,Pandas==0.13.1,Numpy==1.8.1,Python 2.7

最佳答案

这个问题可以转化为寻找列表中的连续数字。找到系列为空的所有索引,并且如果(3,4,5,6)的运行全部为空,则只需提取开始和结束(3,6)

import numpy as np
import pandas as pd
from operator import itemgetter
from itertools import groupby


# create an example
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
s = pd.series( data, index=data)
s = s.reindex(xrange(18))
print find_gap(s)


def find_gap(s):
""" just treat it as a list
"""
nullindex = np.where( s.isnull())[0]
ranges = []
for k, g in groupby(enumerate(nullindex), lambda (i,x):i-x):
group = map(itemgetter(1), g)
ranges.append((group[0], group[-1]))
startgap, endgap = zip(* ranges)
return pd.series( endgap, index= startgap )

引用:Identify groups of continuous numbers in a list

关于python - 描述时间序列 Pandas 中的差距,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24815720/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com