gpt4 book ai didi

python - 在python中将不规则时间序列转换为每小时数据并具有正态分布

转载 作者:行者123 更新时间:2023-12-01 09:14:15 26 4
gpt4 key购买 nike

我有一个如下所示的数据框:

日期时间条目存在
2013-01-07 05:00:00 29.0 12.0
2013-01-07 10:00:00 98.0 83.0
2013-01-07 15:00:00 404.0 131.0
2013-01-07 20:00:00 2340.0 229.0
2013-01-08 05:00:00 3443.0 629.0
2013-01-08 10:00:00 6713.0 1629.0
2013-01-08 15:00:00 9547.0 2965.0
2013-01-08 20:00:00 10440.0 4589.0

我想对其进行转换并标准化,以便它显示一段时间内的每小时消耗量。

日期时间条目存在
2013-01-07 00:00:00 2.0 1.0
2013-01-07 01:00:00 9.0 4.0
2013-01-07 02:00:00 16.0 6.0
2013-01-07 03:00:00 23.0 9.0
2013-01-07 04:00:00 26.0 10.0
2013-01-07 05:00:00 29.0 12.0
2013-01-07 06:00:00 37.0 19.0
2013-01-07 07:00:00 56.0 32.0
2013-01-07 08:00:00 62.0 57.0
2013-01-07 09:00:00 77.0 63.0
2013-01-07 10:00:00 98.0 83.0
2013-01-07 11:00:00 104.0 95.0
......

我想首先将日期和时间作为 DateTime 连接到一列中,然后实现上述结果。

Python 新手,任何帮助将不胜感激。谢谢。

最佳答案

简单的答案是您可以使用

DataFrame.resample().mean().interpolate() 

至少要对帖子的插值部分进行操作。

请注意,您的帖子包含“域外”外推,因为您在输入数据的域之外进行预测。即时间序列从 1/7 上午 5:00 开始,但是过采样数据提前开始 5 小时。插值只是一种域内方法,但我怀疑这就是您想要的。

这是插值的步骤。

首先,如果您可以发布一个包含代码的独立示例,该示例可以生成用于测试的数据,或者可以通过某种方式重现它。

引用这两篇优秀的文章:

Combine Date and Time columns using python pandas

How to create a Pandas DataFrame from a string

我是这样做的:

import pandas as pd
from io import StringIO
from bokeh.plotting import figure, output_notebook, show

# copied and pasted from your post :)
data = StringIO("""
Date Time Entry Exist
2013-01-07 05:00:00 29.0 12.0
2013-01-07 10:00:00 98.0 83.0
2013-01-07 15:00:00 404.0 131.0
2013-01-07 20:00:00 2340.0 229.0
2013-01-08 05:00:00 3443.0 629.0
2013-01-08 10:00:00 6713.0 1629.0
2013-01-08 15:00:00 9547.0 2965.0
2013-01-08 20:00:00 10440.0 4589.0""")

# read in the data, converting the separate date and times to a single date time.
# see the link to do this "after the fact" if your data has separate date and time columns

df = pd.read_csv(data,
parse_dates={"date_time": ['Date', 'Time']},
delim_whitespace=True)

现在,将数据设为时间序列,对其重新采样,应用函数(在本例中是指)并同时对两个数据列进行插值。

df_rs = df.set_index('date_time').resample('H').mean().interpolate('linear')
df_rs

看起来像这样:

enter image description here

这些值看起来与您帖子中的值不完全一样,但尚不清楚使用的是哪种插值。线性,三次?其他?

为了好玩,让我们用 Bokeh 来绘制数据。大红点是原始数据,而蓝点(和连接线)是插值数据。

output_notebook()

p = figure(x_axis_type="datetime", width=800, height=500)

p.title.text = "Entry vs. Date Time (cubic interpolated to 1H)"
p.xaxis.axis_label = 'Date Time (cubic interpolated to 1H)'
p.yaxis.axis_label = 'Entry'

# orig data
p.circle(df['date_time'], df['Entry'], color='red', size=10)

# oversampled data
p.circle(df_rs.index, df_rs['Entry'])
p.line(df_rs.index, df_rs['Entry'])

show(p)

看起来像这样:

enter image description here

或者使用三次插值,您可以获得更多平滑效果:

enter image description here

完整代码

import pandas as pd
from io import StringIO
from bokeh.plotting import figure, output_notebook, show

output_notebook()

# copied and pasted from your post :)
data = StringIO("""
Date Time ENTRIES EXITS
2013-01-07 05:00:00 29.0 12.0
2013-01-07 10:00:00 98.0 83.0
2013-01-07 15:00:00 404.0 131.0
2013-01-07 20:00:00 2340.0 229.0
2013-01-08 05:00:00 3443.0 629.0
2013-01-08 10:00:00 6713.0 1629.0
2013-01-08 15:00:00 9547.0 2965.0
2013-01-08 20:00:00 10440.0 4589.0""")

# read in the data, converting the separate date and times to a single date time.
# see the link to do this "after the fact" if your data as separate date and time columns
original_data = pd.read_csv(data,
parse_dates={"DATETIME": ['Date', 'Time']},
delim_whitespace=True)

# make it a time series, resample to a higher freq, apply mean, interpolate and round
inter_data = original_data.set_index(['DATETIME']).resample('H').mean().interpolate('linear').round(1)

# No need to drop the index to select a slice. You can slice on the index
# I see you are starting at 1/1 (jan 1st), yet your data starts at 1/7 (Jan 7th?)
inter_data[inter_data.index >= '2013-01-01 00:00:00'].head(20)

enter image description here

关于python - 在python中将不规则时间序列转换为每小时数据并具有正态分布,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51392012/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com