gpt4 book ai didi

python - 当字符串值转换为数字大于N时,Pandas如何计数?

转载 作者:行者123 更新时间:2023-12-02 18:10:56 24 4
gpt4 key购买 nike

我的每月数据帧 (df) 已经在最小 - 最大范围内,如下所示:

Wind      Jan       Feb           Nov         Dec      calib
West 0.1-25.5 2.8-65.3 1.3-61.3 0.9-35.3 50
North 0.2-28.3 3.1-66.4 1.0-67.7 1.9-40.1 60
South 0.3-29.5 2.5-49.4 1.9-63.4 0.3-33.0 60
East 20.5 1.1-41.1 0.9-40.3 nan 50

我想知道每个月最大风速低于标准值的次数。因此,我尝试创建一个低于 calib (sbc) 的速度列,如下所示。

month_col = ['Jan', 'Feb', 'Nov', 'Dec']
df['sbc'] = (pd.to_numeric(df[month_col].str.extract(r"(?<=-)(\d+\.\d+)")) < df["calib"]).sum(axis=1)

上面的代码不起作用,我收到错误 AttributeError: 'DataFrame' object has no attribute 'str'。我该如何解决这个问题?

最佳答案

您可以使用melt:

sbc = (df.melt(['Wind', 'calib'], var_name='month')
.assign(value=lambda x: x['value'].str.split('-').str[1].astype(float))
.query('value < calib').value_counts('Wind'))
df['sbc'] = df['Wind'].map(sbc)

输出:

>>> df
Wind Jan Feb Nov Dec calib sbc
0 West 0.1-25.5 2.8-65.3 1.3-61.3 0.9-35.3 50 2
1 North 0.2-28.3 3.1-66.4 1.0-67.7 1.9-40.1 60 2
2 South 0.3-29.5 2.5-49.4 1.9-63.4 0.3-33.0 60 3
3 East 20.5 1.1-41.1 0.9-40.3 NaN 50 2

一步一步:

  1. reshape 数据框
>>> out = df.melt(['Wind', 'calib'], var_name='month')
Wind calib month value
0 West 50 Jan 0.1-25.5
1 North 60 Jan 0.2-28.3
2 South 60 Jan 0.3-29.5
3 East 50 Jan 20.5
4 West 50 Feb 2.8-65.3
5 North 60 Feb 3.1-66.4
6 South 60 Feb 2.5-49.4
7 East 50 Feb 1.1-41.1
8 West 50 Nov 1.3-61.3
9 North 60 Nov 1.0-67.7
10 South 60 Nov 1.9-63.4
11 East 50 Nov 0.9-40.3
12 West 50 Dec 0.9-35.3
13 North 60 Dec 1.9-40.1
14 South 60 Dec 0.3-33.0
15 East 50 Dec NaN
  • 从范围内提取最大风
  • >>> out = out.assign(value=lambda x: x['value'].str.split('-').str[1].astype(float))
    Wind calib month value
    0 West 50 Jan 25.5
    1 North 60 Jan 28.3
    2 South 60 Jan 29.5
    3 East 50 Jan NaN
    4 West 50 Feb 65.3
    5 North 60 Feb 66.4
    6 South 60 Feb 49.4
    7 East 50 Feb 41.1
    8 West 50 Nov 61.3
    9 North 60 Nov 67.7
    10 South 60 Nov 63.4
    11 East 50 Nov 40.3
    12 West 50 Dec 35.3
    13 North 60 Dec 40.1
    14 South 60 Dec 33.0
    15 East 50 Dec NaN
  • 过滤掉行并计数
  • >>> out = out.query('value < calib').value_counts('Wind')
    Wind
    South 3
    East 2
    North 2
    West 2
    dtype: int64

    最后将此系列映射(合并)到您的原始数据框。

    关于python - 当字符串值转换为数字大于N时,Pandas如何计数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72368677/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com