gpt4 book ai didi

python - Pandas 滚动并忽略计数中包含 NaN 的行

转载 作者:行者123 更新时间:2023-12-01 00:46:06 26 4
gpt4 key购买 nike

示例数据

                                   id  val       date
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16
2018-03-31 NaN NaN NaT
2018-04-30 SE0000191827 7 2018-04-20
2018-05-31 NaN NaN NaT
2018-06-30 NaN NaN NaT
2018-07-31 SE0000191827 6 2018-07-11
2018-08-31 NaN NaN NaT
2018-09-30 NaN NaN NaT
2018-10-31 SE0000191827 5 2018-10-19
2018-11-30 NaN NaN NaT
2018-12-31 SE0000191827 9 2018-12-29
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31
2014-02-28 NaN NaN NaT
2014-03-31 NaN NaN NaT
2014-04-30 SE0000195570 3 2014-04-29
2014-05-31 NaN NaN NaT
2014-06-30 NaN NaN NaT
2014-07-31 SE0000195570 2 2014-07-16
2014-08-31 NaN NaN NaT
2014-09-30 NaN NaN NaT
2014-10-31 SE0000195570 1 2014-10-23

(为方便起见,请使用此粘贴箱创建此数据:https://pastebin.com/wMU3esEh)

我想对 val 列应用周期为 4 的 rolling 函数,但只计算 val 所在的行不是NaN。我无法使用 dropna,因为我需要具有 NaN 的行也接收新列中的值。我期望的数据如下。

预期输出

                                   id  val       date  calc
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16 26.0
2018-03-31 NaN NaN NaT 27.0
2018-04-30 SE0000191827 7 2018-04-20 27.0
2018-05-31 NaN NaN NaT NaN
2018-06-30 NaN NaN NaT NaN
2018-07-31 SE0000191827 6 2018-07-11 NaN
2018-08-31 NaN NaN NaT NaN
2018-09-30 NaN NaN NaT NaN
2018-10-31 SE0000191827 5 2018-10-19 NaN
2018-11-30 NaN NaN NaT NaN
2018-12-31 SE0000191827 9 2018-12-29 NaN
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31 10.0
2014-02-28 NaN NaN NaT NaN
2014-03-31 NaN NaN NaT NaN
2014-04-30 SE0000195570 3 2014-04-29 NaN
2014-05-31 NaN NaN NaT NaN
2014-06-30 NaN NaN NaT NaN
2014-07-31 SE0000195570 2 2014-07-16 NaN
2014-08-31 NaN NaN NaT NaN
2014-09-30 NaN NaN NaT NaN
2014-10-31 SE0000195570 1 2014-10-23 NaN

请注意,行 (SE0000191827, 2018-03-31) 也应获得值 27.0。原因是该行下面有四个 val 值,因此我想对其进行计数。

<小时/>

一种尝试如下:

(Pdb) df2.assign(calc=(df2.dropna()['val'].groupby(level=0).rolling(4).sum().shift(-3).reset_index(0, drop=True)))
id val date calc
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16 26.0
2018-03-31 NaN NaN NaT NaN
2018-04-30 SE0000191827 7 2018-04-20 27.0
2018-05-31 NaN NaN NaT NaN
2018-06-30 NaN NaN NaT NaN
2018-07-31 SE0000191827 6 2018-07-11 NaN
2018-08-31 NaN NaN NaT NaN
2018-09-30 NaN NaN NaT NaN
2018-10-31 SE0000191827 5 2018-10-19 NaN
2018-11-30 NaN NaN NaT NaN
2018-12-31 SE0000191827 9 2018-12-29 NaN
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31 10.0
2014-02-28 NaN NaN NaT NaN
2014-03-31 NaN NaN NaT NaN
2014-04-30 SE0000195570 3 2014-04-29 NaN
2014-05-31 NaN NaN NaT NaN
2014-06-30 NaN NaN NaT NaN
2014-07-31 SE0000195570 2 2014-07-16 NaN
2014-08-31 NaN NaN NaT NaN
2014-09-30 NaN NaN NaT NaN
2014-10-31 SE0000195570 1 2014-10-23 NaN

但是,这不会为 (SE0000191827, 2018-03-31) 行获取任何值,因为它被删除到 dropna 中。

<小时/>

据我所知,没有办法通过滚动来跳过其中包含NaN的行。有什么帮助吗?

最佳答案

我建议使用您的groupby(首先删除空值),然后使用df.reindex(index= <#put original index here>)将原始时间步推回到索引中,并且 df.fillna根据已计算的内容..这些值可以在 calc 中没有值的日期上进行估算与 focb (第一次观察向后进行)。这表示为 ffillbfill用 Pandas 的行话来说。

(基本上,将 .reindex(df2.index).groupby(level=0).bfill() 添加到分配函数的末尾)

关于python - Pandas 滚动并忽略计数中包含 NaN 的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56967190/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com