gpt4 book ai didi

python - pandas - 计算高于/低于当前行的连续值

转载 作者:行者123 更新时间:2023-11-28 22:24:31 31 4
gpt4 key购买 nike

我正在寻找一种获取 Pandas 系列并返回新系列的方法,该新系列表示高于/低于系列中每一行的先前连续值的数量:

a = pd.Series([30, 10, 20, 25, 35, 15])

...应该输出:

Value   Higher than streak  Lower than streak
30 0 0
10 0 1
20 1 0
25 2 0
35 4 0
15 0 3

这将使某人能够确定每个“区域最大/最小”值在时间序列中的重要性。

提前致谢。

最佳答案

由于您要向后查看以前的值以查看是否存在连续值,因此您将不得不以某种方式与索引进行交互。此解决方案首先查看当前索引处的值之前的任何值,以查看它们是小于还是大于该值,然后将任何值设置为 False(如果后面有 False)。它还避免在 DataFrame 上创建迭代器,这可能会加速更大数据集的操作。

import pandas as pd
from operator import gt, lt

a = pd.Series([30, 10, 20, 25, 35, 15])

def consecutive_run(op, ser, i):
"""
Sum the uninterrupted consecutive runs at index i in the series where the previous data
was true according to the operator.
"""
thresh_all = op(ser[:i], ser[i])
# find any data where the operator was not passing. set the previous data to all falses
non_passing = thresh_all[~thresh_all]
start_idx = 0
if not non_passing.empty:
# if there was a failure, there was a break in the consecutive truth values,
# so get the final False position. Starting index will be False, but it
# will either be at the end of the series selection and will sum to zero
# or will be followed by all successive True values afterwards
start_idx = non_passing.index[-1]
# count the consecutive runs by summing from the start index onwards
return thresh_all[start_idx:].sum()


res = pd.concat([a, a.index.to_series().map(lambda i: consecutive_run(gt, a, i)),
a.index.to_series().map(lambda i: consecutive_run(lt, a, i))],
axis=1)
res.columns = ['Value', 'Higher than streak', 'Lower than streak']
print(res)

结果:

   Value  Higher than streak  Lower than streak
0 30 0 0
1 10 1 0
2 20 0 1
3 25 0 2
4 35 0 4
5 15 3 0

关于python - pandas - 计算高于/低于当前行的连续值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46303356/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com