gpt4 book ai didi

python - 合并两个具有复杂条件的 Pandas 数据框

转载 作者:太空宇宙 更新时间:2023-11-04 00:32:44 25 4
gpt4 key购买 nike

我想合并两个数据框。让我们考虑以下两个 df:

df1:

id_A,           ts_A,    course,     weight
id1, 2017-04-27 01:35:30, cotton, 3.5
id1, 2017-04-27 01:36:05, cotton, 3.5
id1, 2017-04-27 01:36:55, cotton, 3.5
id1, 2017-04-27 01:37:20, cotton, 3.5
id2, 2017-04-27 02:35:35, cotton blue, 5.0
id2, 2017-04-27 02:36:00, cotton blue, 5.0
id2, 2017-04-27 02:36:35, cotton blue, 5.0
id2, 2017-04-27 02:37:20, cotton blue, 5.0

df2:

id_B,  ts_B,                 value
id1, 2017-03-27 01:25:40, 100
id1, 2017-03-27 01:25:50, 200
id1, 2017-03-27 01:25:50, 230
id1, 2017-04-27 01:35:40, 240
id1, 2017-04-27 01:35:50, 200
id1, 2017-04-27 01:36:00, 350
id1, 2017-04-27 01:36:10, 400
id1, 2017-04-27 01:36:20, 500
id1, 2017-04-27 01:36:30, 600
id1, 2017-04-27 01:36:40, 700
id1, 2017-04-27 01:36:50, 800
id1, 2017-04-27 01:37:00, 900
id1, 2017-04-27 01:37:10, 1000
id2, 2017-04-27 02:35:40, 1000
id2, 2017-04-27 02:35:50, 2000
id2, 2017-04-27 02:36:00, 4500
id2, 2017-04-27 02:36:10, 3000
id2, 2017-04-27 02:36:20, 6000
id2, 2017-04-27 02:36:30, 5000
id2, 2017-04-27 02:36:40, 5022
id2, 2017-04-27 02:36:50, 5040
id2, 2017-04-27 02:37:00, 3200
id2, 2017-04-27 02:37:10, 9000

df1 应与 df2 合并,满足以下条件:给定时间间隔作为 df1 中两个连续行之间的差异,我想将它与 df2 中该时间间隔内所有行的平均值合并。例如,

id_A,           ts_A,    course,     weight
id1, 2017-04-27 01:35:30, cotton, 3.5

应该合并

id_B,  ts_B,                 value
id1, 2017-04-27 01:35:40, 240
id1, 2017-04-27 01:35:50, 200
id1, 2017-04-27 01:36:00, 350

并获得

id_A,           ts_A,    course,     weight  avgValue
id1, 2017-04-27 01:35:30, cotton, 3.5 263.3

我试图通过使用 merge_asof 从另一个角度看问题 - 这会将 df2 的缺失行包括到 df1 中 - 但我没有得到正确的结果:

pd.merge_asof(df2_sorted, df1, left_on='ts_B', right_on='ts_A', left_by='id_B', right_by='id_A', direction='backward')

最佳答案

我想你需要merge_asof , 但计数器使用 reset_index df1 中每行的唯一值:

df1 = df1.reset_index(drop=True)
print (df1.index)
RangeIndex(start=0, stop=8, step=1)

df = pd.merge_asof(df2_sorted,
df1.reset_index(),
left_on='ts_B',
right_on='ts_A',
left_by='id_B',
right_by='id_A')

然后按输出列分组(不要忘记 index 列)并汇总 mean:

df = df.groupby(['id_A','ts_A', 'course', 'weight', 'index'], as_index=False)['value']
.mean()
.drop('index', axis=1)
print (df)
id_A ts_A course weight value
0 id1 2017-04-27 01:35:30 cotton 3.5 263.333333
1 id1 2017-04-27 01:36:05 cotton 3.5 600.000000
2 id1 2017-04-27 01:36:55 cotton 3.5 950.000000
3 id2 2017-04-27 02:35:35 cotton blue 5.0 1500.000000
4 id2 2017-04-27 02:36:00 cotton blue 5.0 4625.000000
5 id2 2017-04-27 02:36:35 cotton blue 5.0 5565.500000

关于python - 合并两个具有复杂条件的 Pandas 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45236581/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com