gpt4 book ai didi

python Pandas : return indexes of common rows

转载 作者:太空宇宙 更新时间:2023-11-04 09:44:34 25 4
gpt4 key购买 nike

抱歉,如果这是一个新手问题。我试图找出两个数据帧之间哪些行是公共(public)的。返回值应该是 df2 的行索引,与 df1 相同。我笨重的例子:

df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df1['col2'] = df1['col2'].map(str);
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})
df2['col2'] = df2['col2'].map(str);

df1['idx'] = df1[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);
df2['idx'] = df2[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);

df1['idx_values'] = df1.index.values
df2['idx_values'] = df2.index.values

df3 = pd.merge(df1,df2,on = 'idx');
myindexes = df3['idx_values_y'];

myindexes.to_csv(idir + 'test.txt',sep='\t',index = False);

返回值应该是[0,4,5]。如果这两个数据框有几百万行,那么能高效地完成这件事就太好了。

最佳答案

不需要具有连接值的新列,默认情况下由两列进行内部合并,如果需要 df2.index 的值,请添加 reset_index :

df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})

df3 = pd.merge(df1,df2.reset_index(), on = ['col1','col2'])
print (df3)
col1 col2 index
0 cx 1 0
1 cx2 12 4
2 cx2 12 5

对于这两个索引需要:

df4 = pd.merge(df1.reset_index(),df2.reset_index(), on = ['col1','col2'])
print (df4)

index_x col1 col2 index_y
0 0 cx 1 0
1 2 cx2 12 4
2 2 cx2 12 5

仅对于两个 DataFrame 的交集:

df5 = pd.merge(df1,df2, on = ['col1','col2'])
#if 2 column DataFrame
#df5 = pd.merge(df1,df2)
print (df5)

col1 col2
0 cx 1
1 cx2 12
2 cx2 12

关于 python Pandas : return indexes of common rows,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50309108/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com