gpt4 book ai didi

python - 如何在 pandas resample().mean() 和 resample().sum() 时禁用 nans 计算?

转载 作者:行者123 更新时间:2023-12-01 00:12:07 27 4
gpt4 key购买 nike

我需要根据月度数据计算年平均值。如果我的月度数据中有 nan 值,我希望全年也为 nan。

这是我到目前为止的代码:

station_data = pd.read_csv(station_data_files[0], sep=';', header=0)
station_data = station_data.replace(-999, np.nan)
station_data = station_data.set_index("MESS_DATUM_BEGINN") # it is a row with time dates

station_data_anual = pd.DataFrame()
station_data_anual["Y_TT"] = station_data["MO_TT"].resample("A").mean()
station_data_anual["Y_RR"] = station_data["MO_RR"].resample("A").sum()

问题是,它忽略了 nan。这意味着例如station_data_anual["Y_RR"] 值太低。对于我只有 nan 作为每月值的年份,它返回 0。

注意:有一些与我类似的问题,但它们对我没有帮助。注:Python

一些说明:

输入数据:

station_data
Out[235]:
STATIONS_ID MESS_DATUM_ENDE QN_4 ... MO_RR MX_RS eor
MESS_DATUM_BEGINN ...
1981-01-01 403.0 1981-01-31 10.0 ... 51.5 10.0 eor
1981-02-01 403.0 1981-02-28 10.0 ... 23.8 5.4 eor
1981-03-01 403.0 1981-03-31 10.0 ... 116.5 28.0 eor
1981-04-01 403.0 1981-04-30 10.0 ... 24.1 9.5 eor
1981-05-01 403.0 1981-05-31 10.0 ... 29.4 8.4 eor
... ... ... ... ... ... ...
2010-08-01 403.0 2010-08-31 10.0 ... NaN 29.1 eor
2010-09-01 403.0 2010-09-30 10.0 ... NaN 29.8 eor
2010-10-01 403.0 2010-10-31 10.0 ... NaN 5.5 eor
2010-11-01 403.0 2010-11-30 10.0 ... NaN 17.5 eor
2010-12-01 403.0 2010-12-31 10.0 ... NaN 8.2 eor

[360 rows x 16 columns]

仔细看看:

station_data["MO_RR"][276:288]
Out[242]:
MESS_DATUM_BEGINN
2004-01-01 66.3
2004-02-01 NaN
2004-03-01 NaN
2004-04-01 NaN
2004-05-01 NaN
2004-06-01 NaN
2004-07-01 NaN
2004-08-01 NaN
2004-09-01 NaN
2004-10-01 NaN
2004-11-01 NaN
2004-12-01 NaN
Name: MO_RR, dtype: float64

输出数据:

station_data_anual
Out[238]:
Y_TT Y_RR
MESS_DATUM_BEGINN
...
2003-12-31 9.866667 430.5
2004-12-31 9.620833 66.3
2005-12-31 9.665833 0.0
2006-12-31 10.158333 0.0
2007-12-31 10.555000 0.0
2008-12-31 10.361667 0.0
2009-12-31 9.587500 0.0
2010-12-31 8.207500 0.0

我的结果必须如下所示:

                        Y_TT       Y_TX      Y_TN   Y_RR
MESS_DATUM_BEGINN
...
Y_TT Y_RR
MESS_DATUM_BEGINN
...
2003-12-31 9.866667 430.5
2004-12-31 9.620833 nan # getting nan instead of 66.3 is especially important
2005-12-31 9.665833 nan
2006-12-31 10.158333 nan
2007-12-31 10.555000 nan
2008-12-31 10.361667 nan
2009-12-31 9.587500 nan
2010-12-31 8.207500 nan

最佳答案

我从未使用过采样,可能有更好的解决方案,可以简单地忽略基于“条件”的“组”。但一个非常简单的解决方案可能是在重新采样之后使用自定义均值函数。

def very_mean(array_like):
if any(pd.isnull(array_like)):
return np.nan
else:
return array_like.mean()

station_data_anual["Y_TT"] = station_data["MO_TT"].resample("A").apply(very_mean)

关于python - 如何在 pandas resample().mean() 和 resample().sum() 时禁用 nans 计算?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59572325/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com