gpt4 book ai didi

Python/ Pandas : Finding a left and right max

转载 作者:行者123 更新时间:2023-12-04 08:55:00 25 4
gpt4 key购买 nike

我有一个 Pandas 数据框,第一列有一个区域,其余的有 8 年的季度数据。大约有 4400 行。这是一个示例:

idx Q12000      Q22000      Q32000      Q42000      Q12001      Q22001      Q32001     Q42001      Q12002      Q22002      Q32002      Q42002

0 4085280.0 4114911.0 4108089.0 4111713.0 4055699.0 4076430.0 4043219.0 4039370.0 4201158.0 4243119.0 4231823.0 4254681.0
1 21226.0 21566.0 21804.0 22072.0 21924.0 23232.0 22748.0 22258.0 22614.0 22204.0 22500.0 22660.0
2 96400.0 102000.0 98604.0 97086.0 96354.0 103054.0 97824.0 95958.0 115938.0 123064.0 120406.0 120648.0
3 23820.0 24116.0 24186.0 23726.0 23504.0 23574.0 23162.0 23078.0 22306.0 22334.0 22152.0 22080.0
4 7838.0 7906.0 7714.0 7676.0 7480.0 7520.0 7102.0 6722.0 8324.0 8166.0 8208.0 8326.0
这是一张图片,描述了我正在尝试计算的内容:
timeline
  • 最低点 :最低点(分钟)
  • nadir_qtr :最低点发生的季度
  • 峰前 : 最低点前的最高点
  • pre-peak_qtr :前峰值发生的季度
  • 高峰后 : 最低点后的最高点
  • post-peak_qtr :出现高峰后的季度
    恢复:最低点之后的一个季度,数字超过了峰值前

  • 我可以很容易地计算出最低点。
    df['nadir'] = df.iloc[:,2:].min(axis=1)
    df['nadir_qtr'] = df.iloc[:,2:].idxmin(axis=1)

    idx Q12000 Q22000 Q32000 Q42000 Q12001 Q22001 Q32001 Q42001 Q12002 Q22002 Q32002 Q42002 nadir nadir_qtr

    0 4085280.0 4114911.0 4108089.0 4111713.0 4055699.0 4076430.0 4043219.0 4039370.0 4201158.0 4243119.0 4231823.0 4254681.0 4039370.0 Q42001
    1 21226.0 21566.0 21804.0 22072.0 21924.0 23232.0 22748.0 22258.0 22614.0 22204.0 22500.0 22660.0 21226 Q12000
    2 96400.0 102000.0 98604.0 97086.0 96354.0 103054.0 97824.0 95958.0 115938.0 123064.0 120406.0 120648.0 95958.0 Q42001
    3 23820.0 24116.0 24186.0 23726.0 23504.0 23574.0 23162.0 23078.0 22306.0 22334.0 22152.0 22080.0 22080.0 Q42002
    4 7838.0 7906.0 7714.0 7676.0 7480.0 7520.0 7102.0 6722.0 8324.0 8166.0 8208.0 8326.0 6722.0 Q42001
    但是当谈到获得前或后峰值或季度时,我很难受困。我最接近的是这样的:
    df['pre-peak'] = df.loc[:,:df['nadir_qtr'].max(axis=1)
    df['pre-peak_qtr'] = df.loc[:,:df['nadir_qtr']].idxmax(axis=1)
    预期输出:
    idx Q12000      Q22000      Q32000      Q42000      Q12001      Q22001      Q32001     Q42001      Q12002      Q22002      Q32002      Q42002      nadir      nadir_qtr   pre-peak      pre-peak_qtr

    0 4085280.0 4114911.0 4108089.0 4111713.0 4055699.0 4076430.0 4043219.0 4039370.0 4201158.0 4243119.0 4231823.0 4254681.0 4039370.0 Q42001 4114911.0 Q22000
    1 21226.0 21566.0 21804.0 22072.0 21924.0 23232.0 22748.0 22258.0 22614.0 22204.0 22500.0 22660.0 21226.0 Q12000 NaN NaN
    2 96400.0 102000.0 98604.0 97086.0 96354.0 103054.0 97824.0 95958.0 115938.0 123064.0 120406.0 120648.0 95958.0 Q42001 103054.0 Q22001
    3 23820.0 24116.0 24186.0 23726.0 23504.0 23574.0 23162.0 23078.0 22306.0 22334.0 22152.0 22080.0 22080.0 Q42002 24816.0 Q32000
    4 7838.0 7906.0 7714.0 7676.0 7480.0 7520.0 7102.0 6722.0 8324.0 8166.0 8208.0 8326.0 6722.0 Q42001 7906.0 Q2200
    但是任何变化都会给我错误的数据或错误(最常见的是)

    TypeError: reduction operation 'argmax' not allowed for this dtype


    我尝试了很多策略,强制将每一行作为一个 numpy 数组进行迭代,分割每一行。我真的被困住了。

    最佳答案

    这是一种使用“辅助”功能的方法:

    # create the data frame
    from io import StringIO
    import pandas as pd

    data = ''' Q12000 Q22000 Q32000 Q42000 Q12001 Q22001 Q32001 Q42001 Q12002 Q22002 Q32002 Q42002

    0 4085280.0 4114911.0 4108089.0 4111713.0 4055699.0 4076430.0 4043219.0 4039370.0 4201158.0 4243119.0 4231823.0 4254681.0
    1 21226.0 21566.0 21804.0 22072.0 21924.0 23232.0 22748.0 22258.0 22614.0 22204.0 22500.0 22660.0
    2 96400.0 102000.0 98604.0 97086.0 96354.0 103054.0 97824.0 95958.0 115938.0 123064.0 120406.0 120648.0
    3 23820.0 24116.0 24186.0 23726.0 23504.0 23574.0 23162.0 23078.0 22306.0 22334.0 22152.0 22080.0
    4 7838.0 7906.0 7714.0 7676.0 7480.0 7520.0 7102.0 6722.0 8324.0 8166.0 8208.0 8326.0
    '''
    df = pd.read_csv(StringIO(data), sep='\s+', engine='python')
    其次,定义辅助函数:
    def calc_nadir(s):
    assert isinstance(s, pd.Series)
    return s.min()

    def calc_nadir_qtr(s):
    return s.argmin()

    def calc_pre_peak(s):
    return s[ : s.argmin()].max()

    def calc_pre_peak_quarter(s):
    try:
    qtr = s[ : s.argmin()].argmax()
    except:
    qtr = None
    return qtr

    def calc_post_peak(s):
    return s[s.argmin() : ].max()

    def calc_post_peak_qtr(s):
    return s[s.argmin() : ].argmax() + s.argmin()
    第三,我们使用辅助函数并组合结果:
    nadir = df.apply(lambda x: calc_nadir(x), axis=1).rename('nadir')
    nadir_qtr = df.apply(lambda x: calc_nadir_qtr(x), axis=1).rename('nadir_qtr')

    pre_peak = df.apply(lambda x: calc_pre_peak(x), axis=1).rename('pre_peak')
    pre_peak_qtr = df.apply(lambda x: calc_pre_peak_quarter(x), axis=1).rename('pre_peak_qtr')

    post_peak = df.apply(lambda x: calc_post_peak(x), axis=1).rename('post_peak')
    post_peak_qtr = df.apply(lambda x: calc_post_peak_qtr(x), axis=1).rename('post_peak_qtr')

    results = pd.concat([nadir, nadir_qtr, pre_peak, pre_peak_qtr,
    post_peak, post_peak_qtr], axis=1)
    print(results)

    nadir nadir_qtr pre_peak pre_peak_qtr post_peak post_peak_qtr
    0 4039370.0 7 4114911.0 1.0 4254681.0 11
    1 21226.0 0 NaN NaN 23232.0 5
    2 95958.0 7 103054.0 5.0 123064.0 9
    3 22080.0 11 24186.0 2.0 22080.0 11
    4 6722.0 7 7906.0 1.0 8326.0 11

    关于Python/ Pandas : Finding a left and right max,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63876495/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com