gpt4 book ai didi

python - 获取每行的三个最小值并返回对应的列名

转载 作者:太空狗 更新时间:2023-10-30 02:26:08 24 4
gpt4 key购买 nike

我有两个数据框,df 和 df2,它们是对应的。现在基于第一个数据帧 df,我想获得一行中的 3 个最小值并返回相应列的名称(在本例中为“X”或“Y”或“Z”或“T”)。所以我可以获得新的数据框 df3。

df = pd.DataFrame({
'X': [21, 2, 43, 44, 56, 67, 7, 38, 29, 130],
'Y': [101, 220, 330, 140, 250, 10, 207, 320, 420, 50],
'Z': [20, 128, 136, 144, 312, 10, 82, 63, 42, 12],
'T': [2, 32, 4, 424, 256, 167, 27, 38, 229, 30]
}, index=list('ABCDEFGHIJ'))

df2 = pd.DataFrame({
'X': [0.5, 0.12,0.43, 0.424, 0.65,0.867,0.17,0.938,0.229,0.113],
'Y': [0.1,2.201,0.33,0.140,0.525,0.31,0.20,0.32,0.420,0.650],
'Z': [0.20,0.128,0.136,0.2144,0.5312,0.61,0.82,0.363,0.542,0.512],
'T':[0.52, 0.232,0.34, 0.6424, 0.6256,0.3167,0.527,0.38,0.4229,0.73]
},index=list('ABCDEFGHIJ'))

除此之外,我想获得另一个数据帧 df4,它对应于 df2 中的 df3,这意味着在 df 行 ['A'] (2,20,21) 中是第 3 个最小值,因此在 df4 行 ['A '], 我想从 df2 得到 (0.52,0.2,0.5)。

最佳答案

如果两个 DataFrames 具有相同顺序的相同列名,则可以使用 argsort对于指数:

arr = df.values.argsort(1)[:,:3]
print (arr)
[[0 3 1]
[1 0 3]
[0 1 3]
[1 2 3]
[1 2 0]
[2 3 1]
[1 0 3]
[0 1 3]
[1 3 0]
[3 0 2]]

#get values by indices in arr
b = df2.values[np.arange(len(arr))[:,None], arr]
print (b)
[[ 0.52 0.2 0.5 ]
[ 0.12 0.232 0.128 ]
[ 0.34 0.43 0.136 ]
[ 0.424 0.14 0.2144]
[ 0.65 0.525 0.6256]
[ 0.31 0.61 0.867 ]
[ 0.17 0.527 0.82 ]
[ 0.38 0.938 0.363 ]
[ 0.229 0.542 0.4229]
[ 0.512 0.73 0.65 ]]

最后使用DataFrame构造函数:

df3 = pd.DataFrame(df.columns[arr])
df3.columns = ['Col{}'.format(x+1) for x in df3.columns]
print (df3)
Col1 Col2 Col3
0 T Z X
1 X T Z
2 T X Z
3 X Y Z
4 X Y T
5 Y Z X
6 X T Z
7 T X Z
8 X Z T
9 Z T Y

df4 = pd.DataFrame(b)
df4.columns = ['Col{}'.format(x+1) for x in df4.columns]
print (df4)
Col1 Col2 Col3
0 0.520 0.200 0.5000
1 0.120 0.232 0.1280
2 0.340 0.430 0.1360
3 0.424 0.140 0.2144
4 0.650 0.525 0.6256
5 0.310 0.610 0.8670
6 0.170 0.527 0.8200
7 0.380 0.938 0.3630
8 0.229 0.542 0.4229
9 0.512 0.730 0.6500

答案相似,所以我创建了时间:

np.random.seed(14)
N = 1000000
df1 = pd.DataFrame(np.random.randint(100, size=(N, 4)), columns=['X','Y','Z','T'])
#print (df1)

df1 = pd.DataFrame(np.random.rand(N, 4), columns=['X','Y','Z','T'])
#print (df1)


def jez():
arr = df.values.argsort(1)[:,:3]
b = df2.values[np.arange(len(arr))[:,None], arr]
df3 = pd.DataFrame(df.columns[arr])
df3.columns = ['Col{}'.format(x+1) for x in df3.columns]
df4 = pd.DataFrame(b)
df4.columns = ['Col{}'.format(x+1) for x in df4.columns]


def pir():
v = df.values
a = v.argpartition(3, 1)[:, :3]
c = df.columns.values[a]
pd.DataFrame(c, df.index)
d = df2.values[np.arange(len(df))[:, None], a]
pd.DataFrame(d, df.index, [1, 2, 3]).add_prefix('Col')

def cᴏʟᴅsᴘᴇᴇᴅ():
#another solution is wrong
df3 = df.apply(lambda x: df.columns[np.argsort(x)], 1).iloc[:, :3]
pd.DataFrame({'Col{}'.format(i + 1) : df2.lookup(df3.index, df3.iloc[:, i]) for i in range(df3.shape[1])}, index=df.index)


print (jez())
print (pir())
print (cᴏʟᴅsᴘᴇᴇᴅ())

In [176]: %timeit (jez())
1000 loops, best of 3: 412 µs per loop

In [177]: %timeit (pir())
1000 loops, best of 3: 425 µs per loop

In [178]: %timeit (cᴏʟᴅsᴘᴇᴇᴅ())
100 loops, best of 3: 3.99 ms per loop

关于python - 获取每行的三个最小值并返回对应的列名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46047432/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com