python - 时间序列分析 - 不均匀间隔的措施

python - 时间序列分析 - 不均匀间隔的措施 - Pandas + statsmodels

转载作者：太空狗更新时间：2023-10-30 00:19:07

我有两个 numpy 数组 light_points 和 time_points，想对这些数据使用一些时间序列分析方法。

然后我尝试了这个:

import statsmodels.api as sm
import pandas as pd
tdf = pd.DataFrame({'time':time_points[:]})
rdf =  pd.DataFrame({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light))
#rdf.index = pd.DatetimeIndex(tdf['time'])

这行得通，但没有做正确的事情。事实上，测量的时间间隔不均匀，如果我只是将 time_points pandas DataFrame 声明为我的帧的索引，我会收到错误消息:

rdf.index = pd.DatetimeIndex(tdf['time'])

decomp = sm.tsa.seasonal_decompose(rdf)

elif freq is None:
raise ValueError("You must specify a freq or x must be a pandas object with a timeseries index")

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

我不知道如何纠正这个问题。此外，pandas 的 TimeSeries 似乎已被弃用。

我试过这个:

rdf = pd.Series({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(tdf['time'])

但它给了我一个长度不匹配:

ValueError: Length mismatch: Expected axis has 1 elements, new values have 122 elements

不过，我不明白它是从哪里来的，因为 rdf['light'] 和tdf['time'] 长度相同...

最终，我尝试将我的 rdf 定义为 pandas 系列:

rdf = pd.Series(light_points[:],index=pd.DatetimeIndex(time_points[:]))

我明白了:

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

然后，我尝试用

代替索引

 pd.TimeSeries(time_points[:])

它在 seasonal_decompose 方法行上给我一个错误:

AttributeError: 'Float64Index' object has no attribute 'inferred_freq'

如何处理间隔不均匀的数据？我正在考虑通过在现有值之间添加许多未知值并使用插值来“评估”这些点来创建一个大致均匀间隔的时间数组，但我认为可能有更清晰、更简单的解决方案。

最佳答案

seasonal_decompose() 需要一个 freq 作为 DateTimeIndex 元信息的一部分提供，可以由 pandas 推断.Index.inferred_freq 或由用户作为 integer 给出每个周期的周期数。例如，每月 12(来自 seasonal_mean 的 docstring):

def seasonal_decompose(x, model="additive", filt=None, freq=None):
    """
    Parameters
    ----------
    x : array-like
        Time series
    model : str {"additive", "multiplicative"}
        Type of seasonal component. Abbreviations are accepted.
    filt : array-like
        The filter coefficients for filtering out the seasonal component.
        The default is a symmetric moving average.
    freq : int, optional
        Frequency of the series. Must be used if x is not a pandas
        object with a timeseries index.

为了说明 - 使用随机样本数据:

length = 400
x = np.sin(np.arange(length)) * 10 + np.random.randn(length)
df = pd.DataFrame(data=x, index=pd.date_range(start=datetime(2015, 1, 1), periods=length, freq='w'), columns=['value'])

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 400 entries, 2015-01-04 to 2022-08-28
Freq: W-SUN

decomp = sm.tsa.seasonal_decompose(df)
data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
data.columns = ['series', 'trend', 'seasonal', 'resid']

Data columns (total 4 columns):
series      400 non-null float64
trend       348 non-null float64
seasonal    400 non-null float64
resid       348 non-null float64
dtypes: float64(4)
memory usage: 15.6 KB

到目前为止一切顺利 - 现在从 DateTimeIndex 中随机删除元素以创建不均匀的空间数据:

df = df.iloc[np.unique(np.random.randint(low=0, high=length, size=length * .8))]

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 222 entries, 2015-01-11 to 2022-08-21
Data columns (total 1 columns):
value    222 non-null float64
dtypes: float64(1)
memory usage: 3.5 KB

df.index.freq

None

df.index.inferred_freq

None

对此数据“有效”运行 seasonal_decomp:

decomp = sm.tsa.seasonal_decompose(df, freq=52)

data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
data.columns = ['series', 'trend', 'seasonal', 'resid']

DatetimeIndex: 224 entries, 2015-01-04 to 2022-08-07
Data columns (total 4 columns):
series      224 non-null float64
trend       172 non-null float64
seasonal    224 non-null float64
resid       172 non-null float64
dtypes: float64(4)
memory usage: 8.8 KB

问题是 - 结果有多大用处。即使没有使季节性模式的推断复杂化的数据间隙(请参阅 .interpolate() 在 release notes 中的示例使用，statsmodels 也按如下方式限定此过程:

Notes
-----
This is a naive decomposition. More sophisticated methods should
be preferred.

The additive model is Y[t] = T[t] + S[t] + e[t]

The multiplicative model is Y[t] = T[t] * S[t] * e[t]

The seasonal component is first removed by applying a convolution
filter to the data. The average of this smoothed series for each
period is the returned seasonal component.

关于python - 时间序列分析 - 不均匀间隔的措施 - Pandas + statsmodels，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39272806/

文章推荐： Python保存一个(稀疏)矩阵，里面有一个变量

文章推荐： c# - 检查 Web API 中的 AllowAnonymousAttribute

文章推荐： c# - 为什么每次都遇到RemoteCertificateNameMismatch？

文章推荐： python - 如何获得与 R 中类似的 Pandas 数据框摘要？

math - 生成圆内的随机点(均匀)
我需要在半径R的圆内生成一个均匀随机点。我意识到，通过在区间 [0 ... 2π) 中选择均匀随机的角度，并在区间 (0 ... R) 中选择均匀随机的半径，我最终会得到更多的点朝向中心，因为对于两
java - 在一个正方形内生成 N 个点(均匀)
我想在一个正方形内生成 N 个点(均匀地)。我怎样才能做到这一点？最佳答案非常酷的问题，比我想象的要困难得多，但这就是想法。有关于 n 边形的论文，但我只会做正方形。因此，圆的均匀分布是一个常见问
python - 如何使 itertools 组合 'increase' 均匀？
考虑以下示例: import itertools import numpy as np a = np.arange(0,5) b = np.arange(0,3) c = np.arange(0,7)
sql - 将一组值分成 5 组，每组应该有 sum(count) 均匀
SQL Server 将一组值分成 5 组，每组的 sum(count) 应该均匀分布。表仅包含 2 列 rid 和 count。 create table t1(rid int, count in
html - CSS:如何使 li 之间的 padding-right 均匀？
我有以下简单的 HTML。 A B C 和 CSS: ul { width: 100%; display: flex; flex-direction:

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 时间序列分析 - 不均匀间隔的措施 - Pandas + statsmodels