gpt4 book ai didi

python - 计算两个数据帧的笛卡尔积的余弦距离

转载 作者:太空宇宙 更新时间:2023-11-03 15:32:59 27 4
gpt4 key购买 nike

数据如下:

u_df = pd.Series({'a':[0,0.11,0.22],'b':[0.92,0.11,0.65],'c':[0.2,0.5,0.23]}).reset_index()
u_df.columns = ['key','value']
v_df = pd.Series({'g':[0.5,0.21,0.5],'f':[0.12,0.191,0.68],'e':[0.2,0.1,0.23]}).reset_index()
v_df.columns = ['key','value']

key value
0 a [0, 0.11, 0.22]
1 b [0.92, 0.11, 0.65]
2 c [0.2, 0.5, 0.23]

key value
0 e [0.2, 0.1, 0.23]
1 f [0.12, 0.191, 0.68]
2 g [0.5, 0.21, 0.5]

我想计算笛卡尔积的两个数据帧之间的余弦距离。我通过以下方式计算余弦距离的两个列表:

def dot(K, L):
if len(K) != len(L):
return 0
return sum(i[0] * i[1] for i in zip(K, L))

def similarity(item_1, item_2):
return dot(item_1, item_2) / np.sqrt(dot(item_1, item_1) * dot(item_2, item_2))

similarities = {item: similarity(target_features[item[0]], train_features[item[1]]) for item in itertools.product(target_features,train_features)}

但我想直接从数据帧计算它,并且我想要最后的结果,例如:

    key1   key2      value
0 a e 0.780720058
1 a f 0.968164605
2 a g 0.733602842
3 b e 0.948870564
4 b f 0.707152537
……

最佳答案

您可以通过 merge 使用交叉连接首先,然后通过apply获取余弦距离:

from scipy.spatial.distance import cosine

u_df['tmp'] = 1
v_df['tmp'] = 1
df = pd.merge(u_df, v_df, on='tmp', how='outer')
df['value'] = df.apply(lambda x: (1 - cosine(x["value_x"], x["value_y"])), axis=1)
df = df[['key_x','key_y','value']]
print (df)
key_x key_y value
0 a e 0.780720
1 a f 0.968165
2 a g 0.733603
3 b e 0.948871
4 b f 0.707153
5 b g 0.967946
6 c e 0.760748
7 c f 0.657643
8 c g 0.740844

关于python - 计算两个数据帧的笛卡尔积的余弦距离,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42762826/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com