gpt4 book ai didi

python - 在 pandas Dataframe 的列上运行函数的有效方法?

转载 作者:太空宇宙 更新时间:2023-11-04 00:33:39 27 4
gpt4 key购买 nike

我想在 Pandas Dataframe 的列上运行一个函数。语料库是一个pd.Dataframe

import pandas as pd 
import numpy as np
from scipy.spatial.distance import cosine

corpus = pd.DataFrame([[3,1,1,1,1,60],[2,2,0,2,0,20], [0,2,1,1,0,0], [0,0,2,1,0,1],[0,0,0,0,1,0]],index=["stark","groß","schwach","klein", "dick"],columns=["d1", "d2", "d3","d4","d5","d6"])

我有疑问。查询是 Pandas 系列。

query = pd.Series([1,1,0,0,0], index=["stark","groß","schwach","klein", "dick"])

现在我想对语料库和查询中的每一列运行余弦函数。

for column in corpus:
print("Similarity of Documents", column," and query: \n" ,1-cosine(query, corpus[column]))

是否有更好的方法对列运行余弦函数?也许是某种获取列并在每一列上运行该函数的方法。我想避免 for 循环。

最佳答案

你可以使用 scipy.spatial.distance.cdist's 'cosine' 向量化解决方案的功能,就像这样 -

from scipy.spatial.distance import cdist

out = 1-cdist(query.values[None], corpus.values.T, 'cosine')

sample 运行-

In [192]: corpus
Out[192]:
d1 d2 d3 d4 d5 d6
stark 3 1 1 1 1 60
groß 2 2 0 2 0 20
schwach 0 2 1 1 0 0
klein 0 0 2 1 0 1
dick 0 0 0 0 1 0

In [193]: query
Out[193]:
stark 1
groß 1
schwach 0
klein 0
dick 0
dtype: int64

In [194]: from scipy.spatial.distance import cosine

In [195]: for column in corpus:
...: print(1-cosine(query, corpus[column]))
...:
0.980580675691
0.707106781187
0.288675134595
0.801783725737
0.5
0.89431540856

In [196]: 1-cdist(query.values[None], corpus.values.T, 'cosine')
Out[196]: array([[ 0.98058, 0.70711, 0.28868, 0.80178, 0.5 , 0.89432]])

运行时测试-

In [225]: corpus = pd.DataFrame(np.random.rand(100,10000))

In [226]: query = pd.Series(np.random.rand(100))

# @C.Square's apply based soln
In [227]: %timeit corpus.apply(lambda x:1-cosine(query, x), axis=0)
1 loop, best of 3: 352 ms per loop

# Proposed in this post using cdist()
In [228]: %timeit 1-cdist(query.values[None], corpus.values.T, 'cosine')
100 loops, best of 3: 3.2 ms per loop

关于python - 在 pandas Dataframe 的列上运行函数的有效方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44973484/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com