gpt4 book ai didi

python - 两个数据框中按索引的公共(public)元素数

转载 作者:太空狗 更新时间:2023-10-29 21:59:08 24 4
gpt4 key购买 nike

我有以下三个数据框:

df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3], 
'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})

df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5],
'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})

df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3],
'id_B': [2, 3, 4, 5]})

df_A

id_A Animal_A
0 1 cat
1 1 dog
2 1 fish
3 1 bird
4 2 cat
5 2 fish
6 3 bird
7 3 cat

df_B

id_B Animal_B
0 1 dog
1 2 cat
2 2 fish
3 3 dog
4 4 fish
5 4 cat
6 5 cat

df_P

id_A id_B
0 1 2
1 1 3
2 2 4
3 3 5

我想在 df_P 中增加一个列,告诉 id_A 和 id_B 之间共享的动物数量。我正在做的是:

df_P["n_common"] = np.nan
for i in df_P.index.tolist():
id_A = df_P["id_A"][i]
id_B = df_P["id_B"][i]
df_P.iloc[i,df_P.columns.get_loc('n_common')] = len(set(df_A['Animal_A'][df_A['id_A']==id_A]).intersection(df_B['Animal_B'][df_B['id_B']==id_B]))

结果是:

df_P

id_A id_B n_common
0 1 2 2.0
1 1 3 1.0
2 2 4 2.0
3 3 5 1.0

是否有更快、更 pythonic 的方法来做到这一点?有没有办法避免 for 循环?

最佳答案

不确定它是更快还是更 pythonic,但它避免了 for 循环 :)

import pandas as pd

df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3],
'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})

df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5],
'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})

df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3],
'id_B': [2, 3, 4, 5]})


df = pd.merge(df_A, df_P, on='id_A')
df = pd.merge(df_B, df, on='id_B')
df = df[df['Animal_A'] == df['Animal_B']].groupby(['id_A', 'id_B'])['Animal_A'].count().reset_index()
df.rename({'Animal_A': 'n_common'},inplace=True,axis=1)

关于python - 两个数据框中按索引的公共(public)元素数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54958169/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com