gpt4 book ai didi

python pandas 用最大值填充 NaN 或毯子

转载 作者:行者123 更新时间:2023-12-04 15:27:19 25 4
gpt4 key购买 nike

我有一个大数据框的问题。这是一个小片段。我想用最大值填充最后一列 E,如果有一些值或让它为空。那就是数据:

d = {'A': [4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074,
4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074],
'B': ['SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746','SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746'],
'C': [201926, 201926, 201926, 201926, 201926, 201926, 201909,201909, 201909, 201909, 201909,
201909, 201933, 201933, 201933, 201933, 201933, 201933],
'D': [-1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4],
'E': [np.nan, 1000, 1000, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 3000, 3000, np.nan]}

看起来像这样:

       A            B       C     D       E
0 4000074 SP000796746 201926 -1 NaN
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 NaN
4 4000074 SP000796746 201926 3 NaN
5 4000074 SP000796746 201926 4 NaN
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 NaN
13 4000074 SP000796746 201933 0 NaN
14 4000074 SP000796746 201933 1 NaN
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 NaN

但我的目标是用最高值填充每个位置的“E”列,如果有的话是范围 -1 到 4(D 列)之间的任何值。如果不是,则它应保留为空。所以它应该看起来像:

      A            B       C      D       E
0 4000074 SP000796746 201926 -1 0
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 0
4 4000074 SP000796746 201926 3 0
5 4000074 SP000796746 201926 4 0
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 3000.0
13 4000074 SP000796746 201933 0 3000.0
14 4000074 SP000796746 201933 1 3000.0
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 3000.0

我的代码是这样的:

df = d

indx = df[df['D'] == -1].index.values

for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'E'] = df.loc[i:j-1, 'E'].max()

if j == indx[-1]:
df.loc[j:, 'E'] = df.loc[j:, 'E'].max()

它不适用于非常大的数据框......也许有人有想法对于另一个代码或我的代码中的更正。

谢谢!!

      A            B       C      D       E
0 4000074 SP000796746 201926 -1 0
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 0
4 4000074 SP000796746 201926 3 0
5 4000074 SP000796746 201926 4 0
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 0
13 4000074 SP000796746 201933 0 0
14 4000074 SP000796746 201933 1 0
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 0

最佳答案

您可以使用 groupby.transform 来完成,max 在 D 列和 cumsum 中使用新的 -1 进行分组。然后 fillna 原始列。

df['E'] = df['E'].fillna(df['E'].groupby(df['D'].eq(-1).cumsum()).transform('max'))

编辑:要用零填充,您可以这样做:

mask = df['E'].groupby(df['D'].eq(-1).cumsum()).transform('any')
df.loc[mask, 'E'] = df.loc[mask, 'E'].fillna(0)

关于python pandas 用最大值填充 NaN 或毯子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61992535/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com