gpt4 book ai didi

python - 基于另一列的 Pandas 滚动第二个最高值

转载 作者:行者123 更新时间:2023-12-04 11:56:35 26 4
gpt4 key购买 nike

对于以下示例数据:

data={'Person':['a','a','a','a','a','b','b','b','b','b','b'],
'Sales':['50','60','90','30','33','100','600','80','90','400','550'],
'Price':['10','12','8','10','12','10','13','16','14','12','10']}
data=pd.DataFrame(data)
对于每个人(组)我想要的价格基于 第二高销售额在滚动的基础上,但每个组的窗口会有所不同。结果应如下所示:
result={'Person':['a','a','a','a','a','b','b','b','b','b','b'],
'Sales':['50','60','90','30','33','100','600','80','90','400','550'],
'Price':['10','12','8','10','12','10','13','16','14','12','10'],
'Second_Highest_Price':['','10','12','12','12','','10','10','10','12','10']}
我尝试使用 nlargest(2) 但不确定如何让它在滚动的基础上工作。

最佳答案

这不是最优雅的解决方案,但我会执行以下操作:
1- 加载数据集

import numpy as np
import pandas as pd

data={'Person':['a','a','a','a','a','b','b','b','b','b','b'],
'Sales':['50','60','90','30','33','100','600','80','90','400','550'],
'Price':['10','12','8','10','12','10','13','16','14','12','10']}

data=pd.DataFrame(data)

data['Sales'] = data['Sales'].astype(float)
2- 使用 Groupby 并一起扩展:
data['2nd_sales'] = data.groupby('Person')['Sales'].expanding(min_periods=2) \
.apply(lambda x: x.nlargest(2).values[-1]).values
3- 计算 Second_Highest_Price :
data['Second_Highest_Price'] = np.where((data['Sales'].shift() == data['2nd_sales']), data['Price'].shift(),
(np.where((data['Sales'] == data['2nd_sales']), data['Price'], np.nan)))

data['Second_Highest_Price'] = data.groupby('Person')['Second_Highest_Price'].ffill()
输出:
data['Second_Highest_Price'].values

array([nan, '10', '12', '12', '12', nan, '10', '10', '10', '12', '10'],
dtype=object)

关于python - 基于另一列的 Pandas 滚动第二个最高值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68293713/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com