gpt4 book ai didi

python - 合并时间到时间段

转载 作者:太空宇宙 更新时间:2023-11-03 10:58:20 28 4
gpt4 key购买 nike

我有一个 DataFrame带有测量值,包含测量值和时间。

time = [datetime.datetime(2011, 1, 1, np.random.randint(0,23), np.random.randint(1, 59)) for _ in xrange(10)]
df_meas = pandas.DataFrame({'time': time, 'value': np.random.random(10)})

例如:

                 time     value
0 2011-01-01 21:56:00 0.115025
1 2011-01-01 04:40:00 0.678882
2 2011-01-01 02:18:00 0.507168
3 2011-01-01 22:40:00 0.938408
4 2011-01-01 12:53:00 0.193573
5 2011-01-01 19:37:00 0.464744
6 2011-01-01 16:06:00 0.794495
7 2011-01-01 18:32:00 0.482684
8 2011-01-01 13:26:00 0.381747
9 2011-01-01 01:50:00 0.035798

数据采集是按周期组织的,我还有另一个 DataFrame为此:

start = pandas.date_range('1/1/2011', periods=5, freq='H')
stop = start + np.timedelta64(50, 'm')
df_runs = pandas.DataFrame({'start': start, 'stop': stop}, index=np.random.randint(0, 1000000, 5))
df_runs.index.name = 'run'

例如:

                     start                stop
run
721158 2011-01-01 00:00:00 2011-01-01 00:50:00
340902 2011-01-01 01:00:00 2011-01-01 01:50:00
211578 2011-01-01 02:00:00 2011-01-01 02:50:00
120232 2011-01-01 03:00:00 2011-01-01 03:50:00
122199 2011-01-01 04:00:00 2011-01-01 04:50:00

现在我想合并两个表,得到:

                 time     value   run
0 2011-01-01 21:56:00 0.115025 NaN
1 2011-01-01 04:40:00 0.678882 122199
2 2011-01-01 02:18:00 0.507168 211578
3 2011-01-01 22:40:00 0.938408 NaN
...

时间段(run s)有一个start和一个 stopstop >= start .不同的运行永远不会重叠。 (即使在我的例子中它不是真的)你可以假设运行是有序的(按 run )并且如果 run1 < run2然后 start1 < start2 (或者您可以简单地按 start 对表格进行排序)。您还可以假设 df_meastime 排序.

该怎么做?有内置的东西吗?什么是最有效的方法?

最佳答案

您可以先通过 stack reshape df_runs - startstop 在一列time 中。然后通过rungroupby, resample分钟ffill用于填充 NaN 值。最后 mergedf_meas:

注意 - 此代码适用于最新的 pandas 版本 0.18.1 see docs .

import pandas as pd
import numpy as np
import datetime as datetime

#for testing
np.random.seed(1)
time = [datetime.datetime(2011, 1, 1, np.random.randint(0,23), np.random.randint(1, 59)) for _ in range(10)]
df_meas = pd.DataFrame({'time': time, 'value': np.random.random(10)})

start = pd.date_range('1/1/2011', periods=5, freq='H')
stop = start + np.timedelta64(50, 'm')
df_runs = pd.DataFrame({'start': start, 'stop': stop}, index=np.random.randint(0, 1000000, 5))
df_runs.index.name = 'run'

df = (df_runs.stack().reset_index(level=1, drop=True).reset_index(name='time'))
print (df)
run time
0 99335 2011-01-01 00:00:00
1 99335 2011-01-01 00:50:00
2 823615 2011-01-01 01:00:00
3 823615 2011-01-01 01:50:00
4 117565 2011-01-01 02:00:00
5 117565 2011-01-01 02:50:00
6 790038 2011-01-01 03:00:00
7 790038 2011-01-01 03:50:00
8 369977 2011-01-01 04:00:00
9 369977 2011-01-01 04:50:00

df1 = (df.set_index('time')
.groupby('run')
.resample('Min')
.ffill()
.reset_index(level=0, drop=True)
.reset_index())

print (df1)
time run
0 2011-01-01 00:00:00 99335
1 2011-01-01 00:01:00 99335
2 2011-01-01 00:02:00 99335
3 2011-01-01 00:03:00 99335
4 2011-01-01 00:04:00 99335
5 2011-01-01 00:05:00 99335
6 2011-01-01 00:06:00 99335
7 2011-01-01 00:07:00 99335
8 2011-01-01 00:08:00 99335
9 2011-01-01 00:09:00 99335
...
...
print (pd.merge(df_meas, df1, on='time', how='left'))
time value run
0 2011-01-01 05:44:00 0.524548 NaN
1 2011-01-01 12:09:00 0.443453 NaN
2 2011-01-01 09:12:00 0.229577 NaN
3 2011-01-01 05:16:00 0.534414 NaN
4 2011-01-01 00:17:00 0.913962 99335.0
5 2011-01-01 01:13:00 0.457205 823615.0
6 2011-01-01 07:46:00 0.430699 NaN
7 2011-01-01 06:26:00 0.939128 NaN
8 2011-01-01 18:21:00 0.778389 NaN
9 2011-01-01 05:19:00 0.715971 NaN

IanS的解决方案非常好,我尝试用 pd.lreshape 改进它:

df_runs['run1'] = -1 
df_runs = df_runs.reset_index()

run_times = (pd.lreshape(df_runs, {'Run':['run', 'run1'],
'Time':['start', 'stop']})
.sort_values('Time')
.set_index('Time'))

print (run_times['Run'].asof(df_meas['time']))

time
2011-01-01 05:44:00 -1
2011-01-01 12:09:00 -1
2011-01-01 09:12:00 -1
2011-01-01 05:16:00 -1
2011-01-01 00:17:00 99335
2011-01-01 01:13:00 823615
2011-01-01 07:46:00 -1
2011-01-01 06:26:00 -1
2011-01-01 18:21:00 -1
2011-01-01 05:19:00 -1
Name: Run, dtype: int64

关于python - 合并时间到时间段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37385992/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com