gpt4 book ai didi

python - 返回新数据框中的第一个匹配值/列名

转载 作者:太空宇宙 更新时间:2023-11-03 14:57:58 27 4
gpt4 key购买 nike

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=6, freq='H')
df = pd.DataFrame({'A': [0, 1, 2, 3, 4,5],
'B': [0, 1, 2, 3, 4,5],
'C': [0, 1, 2, 3, 4,5],
'D': [0, 1, 2, 3, 4,5],
'E': [1, 2, 3, 3, 7,6],
'F': [1, 1, 3, 3, 7,6],
'G': [0, 0, 1, 0, 0,0]

},
index=rng)

一个简单的数据框来帮助我解释:

df


A B C D E F G
2011-01-01 00:00:00 0 0 0 0 1 1 0
2011-01-01 01:00:00 1 1 1 1 2 1 0
2011-01-01 02:00:00 2 2 2 2 3 3 1
2011-01-01 03:00:00 3 3 3 3 3 3 0
2011-01-01 04:00:00 4 4 4 4 7 7 0
2011-01-01 05:00:00 5 5 5 5 6 6 0

当我筛选大于 2 的值时,我得到以下输出:

df[df >= 2]

A B C D E F G
2011-01-01 00:00:00 NaN NaN NaN NaN NaN NaN NaN
2011-01-01 01:00:00 NaN NaN NaN NaN 2.0 NaN NaN
2011-01-01 02:00:00 2.0 2.0 2.0 2.0 3.0 3.0 NaN
2011-01-01 03:00:00 3.0 3.0 3.0 3.0 3.0 3.0 NaN
2011-01-01 04:00:00 4.0 4.0 4.0 4.0 7.0 7.0 NaN
2011-01-01 05:00:00 5.0 5.0 5.0 5.0 6.0 6.0 NaN

对于每一行,我想知道哪一列首先具有匹配值(从左到右)。因此,在 2011-01-01 01:00:00 的行中,它是 E 行,并且值为 2.0。

enter image description here

期望的输出:

我想要得到的是一个新的数据框,其中第一个匹配值位于名为“Value”的列中,另一个名为“From Col”的列捕获了它来自的列名。

如果没有看到匹配项,则输出最后一列(在本例中为 G)。感谢您的帮助。

                       "Value" "From Col"   
2011-01-01 00:00:00 NaN G
2011-01-01 01:00:00 2 E
2011-01-01 02:00:00 2 A
2011-01-01 03:00:00 3 A
2011-01-01 04:00:00 4 A
2011-01-01 05:00:00 5 A

最佳答案

试试这个:

def get_first_valid(ser):
if len(ser) == 0:
return pd.Series([np.nan,np.nan])

mask = pd.isnull(ser.values)
i = mask.argmin()
if mask[i]:
return pd.Series([np.nan, ser.index[-1]])
else:
return pd.Series([ser[i], ser.index[i]])


In [113]: df[df >= 2].apply(get_first_valid, axis=1)
Out[113]:
0 1
2011-01-01 00:00:00 NaN G
2011-01-01 01:00:00 2.0 E
2011-01-01 02:00:00 2.0 A
2011-01-01 03:00:00 3.0 A
2011-01-01 04:00:00 4.0 A
2011-01-01 05:00:00 5.0 A

或:

In [114]: df[df >= 2].T.apply(get_first_valid).T
Out[114]:
0 1
2011-01-01 00:00:00 NaN G
2011-01-01 01:00:00 2 E
2011-01-01 02:00:00 2 A
2011-01-01 03:00:00 3 A
2011-01-01 04:00:00 4 A
2011-01-01 05:00:00 5 A

PS 我获取了 Series.first_valid_index() 函数的源代码,并对其进行了修改...

解释:

In [221]: ser = pd.Series([np.nan, np.nan, 5, 7, np.nan])

In [222]: ser
Out[222]:
0 NaN
1 NaN
2 5.0
3 7.0
4 NaN
dtype: float64

In [223]: mask = pd.isnull(ser.values)

In [224]: mask
Out[224]: array([ True, True, False, False, True], dtype=bool)

In [225]: i = mask.argmin()

In [226]: i
Out[226]: 2

In [227]: ser.index[i]
Out[227]: 2

In [228]: ser[i]
Out[228]: 5.0

关于python - 返回新数据框中的第一个匹配值/列名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41090333/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com