gpt4 book ai didi

python - Groupby 并用 Pandas 中之前和之后值的平均值填充 NaN

转载 作者:行者123 更新时间:2023-12-01 22:56:01 26 4
gpt4 key购买 nike

我尝试使用其 beforeafter 值的 means 填充 NaN 单元格。

   type     date        v1       v2
0 a 2018-09 21511.11 17696.8
1 a 2018-10 NaN NaN
2 a 2018-11 NaN NaN
3 a 2018-12 30319.98 24553.6
4 a 2019-01 NaN NaN
5 a 2019-02 NaN NaN
6 a 2019-03 7409.61 6110.0
7 a 2019-04 NaN NaN
8 a 2019-05 NaN NaN
9 a 2019-06 15212.51 12590.5
10 a 2019-07 NaN NaN
11 a 2019-08 NaN NaN
12 a 2019-09 23129.96 19160.9
13 a 2019-10 NaN NaN
14 a 2019-11 NaN NaN
15 b 2018-09 21511.11 17696.8
16 b 2018-10 NaN NaN
17 b 2018-11 NaN NaN
18 b 2018-12 30319.98 24553.6
19 b 2019-01 NaN NaN
20 b 2019-02 NaN NaN
21 b 2019-03 7409.61 6110.0
22 b 2019-04 NaN NaN
23 b 2019-05 NaN NaN
24 b 2019-06 15212.51 12590.5
25 b 2019-07 NaN NaN
26 b 2019-08 NaN NaN
27 b 2019-09 23129.96 19160.9
28 b 2019-10 NaN NaN
29 b 2019-11 NaN NaN

我尝试使用下面的代码并引用 here :

df[['v1', 'v2']] = (df[['v1', 'v2']].ffill()+df[['v1', 'v2']].bfill())/2
df[['v1', 'v2']] = df[['v1', 'v2']].bfill().ffill()

我得到:

   type     date         v1        v2
0 a 2018-09 21511.110 17696.80
1 a 2018-10 25915.545 21125.20
2 a 2018-11 25915.545 21125.20
3 a 2018-12 30319.980 24553.60
4 a 2019-01 18864.795 15331.80
5 a 2019-02 18864.795 15331.80
6 a 2019-03 7409.610 6110.00
7 a 2019-04 11311.060 9350.25
8 a 2019-05 11311.060 9350.25
9 a 2019-06 15212.510 12590.50
10 a 2019-07 19171.235 15875.70
11 a 2019-08 19171.235 15875.70
12 a 2019-09 23129.960 19160.90
13 a 2019-10 22320.535 18428.85
14 a 2019-11 22320.535 18428.85
15 b 2018-09 21511.110 17696.80
16 b 2018-10 25915.545 21125.20
17 b 2018-11 25915.545 21125.20
18 b 2018-12 30319.980 24553.60
19 b 2019-01 18864.795 15331.80
20 b 2019-02 18864.795 15331.80
21 b 2019-03 7409.610 6110.00
22 b 2019-04 11311.060 9350.25
23 b 2019-05 11311.060 9350.25
24 b 2019-06 15212.510 12590.50
25 b 2019-07 19171.235 15875.70
26 b 2019-08 19171.235 15875.70
27 b 2019-09 23129.960 19160.90
28 b 2019-10 23129.960 19160.90
29 b 2019-11 23129.960 19160.90

但我不知道如何按 type 进行分组并应用上面的代码。有人可以帮忙吗?谢谢。

最佳答案

添加groupby以及要处理的列列表,对于每个组的第一个和最后一个缺失值,使用apply以避免从一个组值替换到另一个组值(如果存在某些值)组中仅包含 NaN 值:

g = df.groupby('type')['v1', 'v2']
df[['v1', 'v2']] = (g.ffill()+g.bfill())/2

df[['v1', 'v2']] = g.apply(lambda x: x.bfill().ffill())

仅适用于数字列的解决方案:

cols = df.select_dtypes(np.number).columns

g = df.groupby('type')[cols]
df[cols] = (g.ffill()+g.bfill())/2
df[cols] = g.apply(lambda x: x.bfill().ffill())

关于python - Groupby 并用 Pandas 中之前和之后值的平均值填充 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59279935/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com