gpt4 book ai didi

python - 过滤列表中与每行 pandas 的条件匹配的第一个元素

转载 作者:太空宇宙 更新时间:2023-11-04 08:24:54 25 4
gpt4 key购买 nike

问题/问题:我想创建另一列,其中包含第一列(许多)/或列表中符合条件(不同于“nan”)的值。

我正在处理一个数据框,该数据框具有多个用作标志的列,每一列都是一种不同类型的标志。这是它的样子:

         id_number  createdat  ... flag_3.3.3.2.1 flag_3.3.3.2.2 flag_3.3.3.3.1
1 718v 2019-08-14 ... nan 3.3.3.2.2 3.3.3.3.1
2 566m 2019-07-10 ... nan nan nan
3 636p 2019-06-12 ... 3.3.3.2.1 nan 3.3.3.3.1
4 630r 2019-06-30 ... nan nan nan
26815 066p 2019-08-24 ... 3.3.3.2.1 3.3.3.2.2 3.3.3.3.1
26816 769b 2019-08-10 ... nan nan nan

我设法创建了一个列,该列生成包含“flag_”的列的所有值的列表:

payday_cols = [col for col in df if col.startswith('flag_')]
df['flagging'] = df[payday_cols].values.tolist()
print(df)
id_number ... flag_3.3.3.3.1 flagging
1 718v ... nan [nan, nan, nan, nan, nan, nan, nan, nan, nan, ...
2 566m ... nan [nan, nan, nan, nan, nan, nan, nan, nan, nan, ...
3 636p ... nan [nan, nan, 2.2, nan, nan, nan, nan, nan, nan, ...
4 630r ... nan [nan, nan, nan, 3.1, nan, nan, nan, nan, 3.3.2... ...
26815 066p ... 3.3.3.3.1 [nan, nan, nan, nan, 3.2, nan, nan, nan, nan, ...
26816 769b ... nan [1, nan, nan, nan, nan, nan, nan, nan, 3.3.2.1...

我缺少的是一种创建最终列的方法,该列包含列表中与 nan 不同的第一个值,如果没有与 nan 不同的值,则为 nan 。输出将是这样的:

         id_number  ... flag_3.3.3.3.1                                           flagging      flag
1 718v ... nan [nan, nan, nan, nan, nan, nan, nan, nan, nan, ... nan
2 566m ... nan [nan, nan, nan, nan, nan, nan, nan, nan, nan, ... nan
3 636p ... nan [nan, nan, 2.2, nan, nan, nan, nan, nan, nan, ... 2.2
4 630r ... nan [nan, nan, nan, 3.1, nan, nan, nan, nan, 3.3.2... 3.1
26815 066p ... 3.3.3.3.1 [nan, nan, nan, nan, 3.2, nan, nan, nan, nan, ... 3.2
26816 769b ... nan [1, nan, nan, nan, nan, nan, nan, nan, 3.3.2.1... 3.3.2.1

非常感谢,如果您需要我生成与这些值类似的值,以便您可以重新创建此案例,我将使用它来编辑这篇文章。

最佳答案

方法一:

试试 bfill.iloc

df[payday_cols].bfill(1).iloc[:,0]

Out[92]:
1 3.3.3.2.2
2 NaN
3 3.3.3.2.1
4 NaN
26815 3.3.3.2.1
26816 NaN
Name: flag_3.3.3.2.1, dtype: object

方法二:

另一个解决方案是在 notna 上使用 numpy argmax 并将结果传递给 lookup

m = df[payday_cols].notna().values.argmax(1)
df[payday_cols].lookup(df.index, np.array(payday_cols)[m])

Out[145]: array(['3.3.3.2.2', nan, '3.3.3.2.1', nan, '3.3.3.2.1', nan], dtype=object)

注意:输出基于此样本

In [83]: df

Out[83]:
id_number createdat flag_3.3.3.2.1 flag_3.3.3.2.2 flag_3.3.3.3.1
1 718v 2019-08-14 NaN 3.3.3.2.2 3.3.3.3.1
2 566m 2019-07-10 NaN NaN NaN
3 636p 2019-06-12 3.3.3.2.1 NaN 3.3.3.3.1
4 630r 2019-06-30 NaN NaN NaN
26815 066p 2019-08-24 3.3.3.2.1 3.3.3.2.2 3.3.3.3.1
26816 769b 2019-08-10 NaN NaN NaN

关于python - 过滤列表中与每行 pandas 的条件匹配的第一个元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58311181/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com