gpt4 book ai didi

python - 按小时对齐两个时间序列数据集(Python、Pandas)

转载 作者:太空宇宙 更新时间:2023-11-04 03:53:52 28 4
gpt4 key购买 nike

我要比较两个数据集。一个是测量的气象值,大约每 15 分钟测量一次,但每小时测量的时间不一致(即 12:03、1:05、2:01 等)。另一个数据集是准确小时位置的建模数据。我想从最接近小时标记的测量数据中提取值,以与建模数据相结合。

我目前将这两个集合都设置为 DataFrame 格式,并创建了一个每小时时间序列用作索引。有谁知道无需循环遍历所有数据即可对齐这些数据的简单方法吗?

谢谢。

使用 df.resample('H', how='ohlc') 方法,我得到以下错误:

Traceback (most recent call last):
File "<pyshell#81>", line 1, in <module>
df.resample('H', how='ohlc')
File "C:\Python33\lib\site-packages\pandas\core\generic.py", line 290, in resample
return sampler.resample(self)
File "C:\Python33\lib\site-packages\pandas\tseries\resample.py", line 83, in resample
rs = self._resample_timestamps(obj)
File "C:\Python33\lib\site-packages\pandas\tseries\resample.py", line 226, in _resample_timestamps
result = grouped.aggregate(self._agg_method)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 1695, in aggregate
return getattr(self, arg)(*args, **kwargs)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 427, in ohlc
return self._cython_agg_general('ohlc')
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 1618, in _cython_agg_general
new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 1656, in _cython_agg_blocks
result, _ = self.grouper.aggregate(values, how, axis=agg_axis)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 818, in aggregate
raise NotImplementedError
NotImplementedError

我的数据框示例如下所示:

                              D
2008-01-01 00:01:00 274.261108
2008-01-01 00:11:00 273.705566
2008-01-01 00:31:00 273.705566
2008-01-01 00:41:00 273.705566
2008-01-01 01:01:00 273.705566
2008-01-01 01:11:00 273.705566
2008-01-01 01:31:00 273.705566
2008-01-01 01:41:00 273.705566
2008-01-01 02:01:00 273.705566
2008-01-01 02:11:00 273.149994

编辑: 使用 python 3.3 时,这似乎是一个错误。谁能证实这一点?

最佳答案

我认为pandas.DataFrame.resample()是你在这里需要的。你可以查一下method of resampling例如,您想要检查“ohlc”:

>>> df = pd.DataFrame({'data':[1,4,3,2,7,3]}, index=pd.DatetimeIndex(['2013-11-05 12:03', '2013-11-05 12:14','2013-11-05 12:29','2013-11-05 12:46','2013-11-05 13:01','2013-11-05 13:16']))
>>> df.resample('H', how='ohlc')
data
open high low close
2013-11-05 12:00:00 1 4 1 2
2013-11-05 13:00:00 7 7 3 3

在那之后,您需要做的就是使用 pandas.DataFrame.join() .

更新 很奇怪,在你的DataFrame上试过了:

>>> df = pd.DataFrame({'D':[274.261108,273.705566,273.705566,273.705566,273.705566,273.705566,273.705566,273.705566,273.705566,273.149994]})
>>> df.index = pd.DatetimeIndex(['2008.01.01 00:01:00','2008.01.01 00:11:00','2008.01.01 00:31:00','2008.01.01 00:41:00','2008.01.01 01:01:00','2008.01.01 01:11:00','2008.01.01 01:31:00','2008.01.01 01:41:00','2008.01.01 02:01:00','2008.01.01 02:11:00'])
>>> df.resample('H', how='ohlc')
D
open high low close
2008-01-01 00:00:00 274.261108 274.261108 273.705566 273.705566
2008-01-01 01:00:00 273.705566 273.705566 273.705566 273.705566
2008-01-01 02:00:00 273.705566 273.705566 273.149994 273.149994

工作正常。

关于python - 按小时对齐两个时间序列数据集(Python、Pandas),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19796111/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com