gpt4 book ai didi

Pandas Dataframes 删除重复索引,根据列值首先保留最大值

转载 作者:行者123 更新时间:2023-12-05 04:39:33 28 4
gpt4 key购买 nike

这是我目前的 df。我想分 3 个步骤转换数据框。我需要删除重复的时间戳,但想根据“边”列保留最大值或最小值。请帮助:)

我已经尝试过 df= df[~df.index.duplicated(keep='first')] 但是它没有保留最大值或最小值的选项

索引类型为datetime格式,Price为float,Side为integer,data frame有8000+行。

                          Price      Side  
2021-12-13 00:00:03.285 51700 4
2021-12-13 00:00:03.315 51675 3
2021-12-13 00:00:03.333 50123 4
2021-12-13 00:00:03.333 50200 3
2021-12-13 00:00:03.333 50225 3
2021-12-13 00:00:03.333 50250 3
2021-12-13 00:00:03.421 50123 4
2021-12-13 00:00:03.421 50117 4
2021-12-13 00:00:03.421 50110 4
2021-12-13 00:00:03.671 50100 3
  1. 如果时间重复,如果边为“3”,则保留最高值;如果时间重复且边为“4”,则保留最低值。
Desired Output:
Price Side
2021-12-13 00:00:03.285 51700 4
2021-12-13 00:00:03.315 51675 3
2021-12-13 00:00:03.333 50123 4
2021-12-13 00:00:03.333 50250 3
2021-12-13 00:00:03.421 50110 4
2021-12-13 00:00:03.671 50100 3
  1. 用相应的价格创建新的列“3”和“4”
Desired Output:
Price 3 4
2021-12-13 00:00:03.285 51700 0 51700
2021-12-13 00:00:03.315 51675 51675 0
2021-12-13 00:00:03.333 50123 0 50123
2021-12-13 00:00:03.333 50250 50250 0
2021-12-13 00:00:03.421 50110 0 50110
2021-12-13 00:00:03.671 50100 50100 0
  1. 用同一列中以前的值填空
Desired Output:
Price 3 4
2021-12-13 00:00:03.285 51700 0 51700
2021-12-13 00:00:03.315 51675 51675 51700
2021-12-13 00:00:03.333 50123 51675 50123
2021-12-13 00:00:03.333 50250 50250 50123
2021-12-13 00:00:03.421 50110 50250 50110
2021-12-13 00:00:03.671 50100 50100 50110

最佳答案

new_df = (df
.groupby([pd.Grouper(level=0), 'Side'])
.apply(lambda x: x['Price'].max() if x['Side'].mode()[0] == 3 else x['Price'].min())
.reset_index()
)
new_df = (
pd.concat([
new_df,
(new_df
.pivot(columns='Side', values=0)
.ffill()
.fillna(0)
)
], axis=1)
.drop('Side', axis=1)
.rename({0: 'Price'}, axis=1)
)

输出:

>>> df
index Price 3 4
0 2021-12-13 00:00:03.285 51700 0.0 51700.0
1 2021-12-13 00:00:03.315 51675 51675.0 51700.0
2 2021-12-13 00:00:03.333 50250 50250.0 51700.0
3 2021-12-13 00:00:03.333 50123 50250.0 50123.0
4 2021-12-13 00:00:03.421 50110 50250.0 50110.0
5 2021-12-13 00:00:03.671 50100 50100.0 50110.0

精简版:

new_df = df.groupby([pd.Grouper(level=0), 'Side']).apply(lambda x: x['Price'].max() if x['Side'].mode()[0] == 3 else x['Price'].min()).reset_index()
new_df = pd.concat([new_df, new_df.pivot(columns='Side', values=0).ffill().fillna(0)], axis=1).drop('Side', axis=1).rename({0:'Price'}, axis=1))

关于Pandas Dataframes 删除重复索引,根据列值首先保留最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70429614/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com