gpt4 book ai didi

python - Pandas:条件聚合

转载 作者:行者123 更新时间:2023-12-03 22:58:50 28 4
gpt4 key购买 nike

我正在尝试使用 Pandas 数据框开发以下过滤器:

  • 我有四列, ABA_primeB_prime
  • 如果 A 或 B 小于阈值 C ,那么我想找到 A_primeB_prime 之间的总和并将其分配给 A_primeB_prime 的最大值,同时将 A_primeB_prime 的最小值设置为零。

  • 我将如何将其编写为 Pandas 聚合函数?
    编写效率低下的工作示例如下所示:
    import pandas as pd
    import numpy as np

    data = {
    "A":list(np.abs(np.random.randn(10))),
    "B":list(np.abs(np.random.randn(10))),
    "A_prime":list(np.abs(np.random.randn(10))),
    "B_prime":list(np.abs(np.random.randn(10)))

    }

    df = pd.DataFrame.from_dict(data)
    C = 0.2

    print("BEFORE:")
    print(df)


    for index, row in df.iterrows():
    if(row["A"] < C or row["B"] < C):
    max_idx = np.argmax([row["A"], row["B"]])
    if(max_idx==0):
    row["A_prime"] = row["A_prime"] + row["B_prime"]
    row["B_prime"] = 0
    else:
    row["B_prime"] = row["A_prime"] + row["B_prime"]
    row["A_prime"] = 0

    print("")
    print("AFTER:")
    print(df)
    输出:
    BEFORE:
    A B A_prime B_prime
    0 0.182445 0.924890 1.563398 0.562325
    1 0.252587 0.273637 0.515395 0.538876
    2 1.369412 1.985702 1.813962 1.643794
    3 0.834666 0.143880 0.860673 0.372468
    4 1.380012 0.715774 0.022681 0.892717
    5 0.582497 0.477100 0.956821 1.134613
    6 0.083045 0.322060 0.362513 1.386124
    7 1.384267 0.251577 0.639843 0.458650
    8 0.375456 0.412320 0.661661 0.086588
    9 0.079226 0.385621 0.601451 0.837827

    AFTER:
    A B A_prime B_prime
    0 0.182445 0.924890 0.000000 2.125723
    1 0.252587 0.273637 0.515395 0.538876
    2 1.369412 1.985702 1.813962 1.643794
    3 0.834666 0.143880 1.233141 0.000000
    4 1.380012 0.715774 0.022681 0.892717
    5 0.582497 0.477100 0.956821 1.134613
    6 0.083045 0.322060 0.000000 1.748638
    7 1.384267 0.251577 0.639843 0.458650
    8 0.375456 0.412320 0.661661 0.086588
    9 0.079226 0.385621 0.000000 1.439278

    最佳答案

    这是一种方法:

    prime_cols = ["A_prime", "B_prime"]

    # get the candidate sums
    prime_sums = df[prime_cols].sum(axis=1)

    # check which rows satisfy the `C` threshold
    threshold_satisfied = df.A.lt(C) | df.B.lt(C)

    # set the satisfying rows' values to sums for both columns
    df.loc[threshold_satisfied, prime_cols] = prime_sums

    # generate a 1-0 mask that will multiply the greater value by 1 and
    # smaller value by 0 to "select" one of them and kill other
    mask_A_side = df.A.gt(df.B)
    the_mask = pd.concat([mask_A_side, ~mask_A_side], axis=1).set_axis(prime_cols, axis=1)

    # multiply with the mask
    df.loc[threshold_satisfied, prime_cols] *= the_mask
    它首先将质数列的总和放在满足阈值条件的两列中,然后用 1-0 掩码乘法杀死其中一个。
    要得到
    >>> df

    A B A_prime B_prime
    0 0.182445 0.924890 0.000000 2.125723
    1 0.252587 0.273637 0.515395 0.538876
    2 1.369412 1.985702 1.813962 1.643794
    3 0.834666 0.143880 1.233141 0.000000
    4 1.380012 0.715774 0.022681 0.892717
    5 0.582497 0.477100 0.956821 1.134613
    6 0.083045 0.322060 0.000000 1.748637
    7 1.384267 0.251577 0.639843 0.458650
    8 0.375456 0.412320 0.661661 0.086588
    9 0.079226 0.385621 0.000000 1.439278

    关于python - Pandas:条件聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67869185/

    28 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com