gpt4 book ai didi

python - 用其他时间间隔填补时间间隔中的空白

转载 作者:行者123 更新时间:2023-12-03 07:53:25 25 4
gpt4 key购买 nike

我们有两个带有时间间隔的表。我想用 df2 填充 df1 中的空白,如图所示,以获得 df3df1 按原样移动到 df3,并且仅将 df2 中位于 df1 间隙中的部分移动(差异)被移至 df3。

[1]: /image/uQeML.png

df1 = pd.DataFrame({'Start': ['2023-01-01', '2023-02-01', '2023-03-15', '2023-04-18', '2023-05-15', '2023-05-25'],
'End': ['2023-01-15', '2023-02-20', '2023-04-01', '2023-05-03', '2023-05-20', '2023-05-30']})

df2 = pd.DataFrame({'Start': ['2023-01-02', '2023-01-05', '2023-01-20', '2023-02-25', '2023-03-05', '2023-04-18', '2023-05-12'],
'End': ['2023-01-03', '2023-01-10', '2023-02-10', '2023-03-01', '2023-04-15', '2023-05-10', '2023-06-05']})

df3 = pd.DataFrame({'Start': ['2023-01-01', '2023-01-20', '2023-02-01', '2023-02-25', '2023-03-05', '2023-03-15', '2023-04-02', '2023-04-18', '2023-05-04', '2023-05-12', '2023-05-15', '2023-05-21', '2023-05-25', '2023-05-31'],
'End': ['2023-01-15', '2023-01-31', '2023-02-20', '2023-03-01', '2023-03-14', '2023-04-01', '2023-04-15', '2023-05-03', '2023-05-10', '2023-05-14', '2023-05-20', '2023-05-24', '2023-05-30', '2023-06-05']})

# df1

Start End
0 2023-01-01 2023-01-15
1 2023-02-01 2023-02-20
2 2023-03-15 2023-04-01
3 2023-04-18 2023-05-03
4 2023-05-15 2023-05-20
5 2023-05-25 2023-05-30

# df2

Start End
0 2023-01-02 2023-01-03
1 2023-01-05 2023-01-10
2 2023-01-20 2023-02-10
3 2023-02-25 2023-03-01
4 2023-03-05 2023-04-15
5 2023-04-18 2023-05-10
6 2023-05-12 2023-06-05

# df3 (desired result)

Start End
0 2023-01-01 2023-01-15
1 2023-01-20 2023-01-31
2 2023-02-01 2023-02-20
3 2023-02-25 2023-03-01
4 2023-03-05 2023-03-14
5 2023-03-15 2023-04-01
6 2023-04-02 2023-04-15
7 2023-04-18 2023-05-03
8 2023-05-04 2023-05-10
9 2023-05-12 2023-05-14
10 2023-05-15 2023-05-20
11 2023-05-21 2023-05-24
12 2023-05-25 2023-05-30
13 2023-05-31 2023-06-05

生成绘图的代码:

import plotly.express as px

df_plot = pd.concat(
[
df1.assign(color='df1', df='df1'),
df2.assign(color='df2', df='df2'),
df3.assign(color=['df1', 'df2', 'df1', 'df2', 'df2', 'df1', 'df2', 'df1', 'df2', 'df2', 'df1', 'df2', 'df1', 'df2'], df='df3')
],
)
fig = px.timeline(df_plot, x_start="Start", x_end="End", y="df", color="color")
fig.update_yaxes(categoryorder='category descending')
fig.show()

最佳答案

我想我可以让你接近:

df1 = pd.DataFrame({'Start': ['2023-01-01', '2023-02-01', '2023-03-15'],
'End': ['2023-01-15', '2023-02-20', '2023-04-01']})
df2 = pd.DataFrame({'Start': ['2023-01-02', '2023-01-05', '2023-01-20', '2023-02-25', '2023-03-05'],
'End': ['2023-01-03', '2023-01-10', '2023-02-10', '2023-03-01', '2023-04-15']})
df3 = pd.DataFrame({'Start': ['2023-01-01', '2023-01-20', '2023-02-01', '2023-02-25', '2023-03-05', '2023-03-15', '2023-04-02'],
'End': ['2023-01-15', '2023-01-31', '2023-02-20', '2023-03-01', '2023-03-14', '2023-04-01', '2023-04-15']})


df1['dates'] = [pd.date_range(s,e) for s, e in zip(df1['Start'], df1['End'])]
df2['dates'] = [pd.date_range(s,e) for s, e in zip(df2['Start'], df2['End'])]

df1e = df1.explode('dates').assign(source='df1')
df2e = df2.explode('dates').assign(source='df2')


df3e = df1e.set_index(df1e['dates']).combine_first(df2e.set_index(df2e['dates']))
df3e['dates'] = pd.to_datetime(df3e['dates'])

df3e['group'] = ((df3e['source'] != df3e['source'].shift()) |
(df3e['dates'] - df3e['dates'].shift() > pd.Timedelta(days=1))).cumsum()

df_out = df3e.groupby(['group', 'source'])['dates'].agg([min, max])

输出:

                    min        max
group source
1 df1 2023-01-01 2023-01-15
2 df2 2023-01-20 2023-01-31
3 df1 2023-02-01 2023-02-20
4 df2 2023-02-25 2023-03-01
5 df2 2023-03-05 2023-03-14
6 df1 2023-03-15 2023-04-01
7 df2 2023-04-02 2023-04-15

图形输出:

import plotly.express as px

df_out = df_out.reset_index().rename({'source':'color', 'min':'Start', 'max':'End'}, axis=1)
df_plot = pd.concat(
[
df1.assign(color='df1'),
df2.assign(color='df2'),
df_out
],
keys=['df1' , 'df2', 'df3']
).reset_index(level=0, names='df')
fig = px.timeline(df_plot, x_start="Start", x_end="End", y="df", color="color")
fig.update_yaxes(categoryorder='category descending')
fig.show()

图表:

enter image description here

更新后的数据集:

                    min        max
group source
1 df1 2023-01-01 2023-01-15
2 df2 2023-01-20 2023-01-31
3 df1 2023-02-01 2023-02-20
4 df2 2023-02-25 2023-03-01
5 df2 2023-03-05 2023-03-14
6 df1 2023-03-15 2023-04-01
7 df2 2023-04-02 2023-04-15
8 df1 2023-04-18 2023-05-03
9 df2 2023-05-04 2023-05-10
10 df2 2023-05-12 2023-05-14
11 df1 2023-05-15 2023-05-20
12 df2 2023-05-21 2023-05-24
13 df1 2023-05-25 2023-05-30
14 df2 2023-05-31 2023-06-05

图形输出:

enter image description here

关于python - 用其他时间间隔填补时间间隔中的空白,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76605223/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com