gpt4 book ai didi

python - 在 Pandas 数据框中查找列和索引

转载 作者:行者123 更新时间:2023-12-01 21:30:56 27 4
gpt4 key购买 nike

我有一个 Pandas 数据框:

  col1 | col2 | col3 | col4 |
0. A | B | C | G|
1. I | J | S | D|
2. O | L | C | G|
3. A | B | H | D|
4. H | B | C | P|

# reproducible
import pandas as pd
from string import ascii_uppercase as uc # just for sample data
import random # just for sample data

random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
'col2': [random.choice(uc) for _ in range(20)],
'col3': [random.choice(uc) for _ in range(20)],
'col4': [random.choice(uc) for _ in range(20)]})

我正在寻找这样的函数:

func('H')

这将返回“H”所在的所有索引和列的名称。有什么想法吗?

最佳答案

使用,np.argwhere连同 df.to_numpy :

rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))

或者,

indices = df.where(df.eq('H')).stack().index.tolist()

# print(indices)
[(3, 'col3'), (4, 'col1')]

timeit 所有答案的比较:

df.shape
(50000, 4)

%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c]))
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - 在 Pandas 数据框中查找列和索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62433893/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com