gpt4 book ai didi

python - 如何匹配python中运行时间太长的所有键值对

转载 作者:行者123 更新时间:2023-12-01 07:32:08 25 4
gpt4 key购买 nike

用户-项目亲和性和推荐:
我正在创建一个表格,建议“购买该商品的客户也购买了算法”
输入数据集

productId   userId
Prod1 a
Prod1 b
Prod1 c
Prod1 d
prod2 b
prod2 c
prod2 a
prod2 b
prod3 c
prod3 a
prod3 d
prod3 c
prod4 a
prod4 b
prod4 d
prod4 a
prod5 d
prod5 a

需要输出

Product1    Product2    score
Prod1 prod3
Prod1 prod4
Prod1 prod5
prod2 Prod1
prod2 prod3
prod2 prod4
prod2 prod5
prod3 Prod1
prod3 prod2
Using code : 
#Get list of unique items
itemList=list(set(main["productId"].tolist()))

#Get count of users
userCount=len(set(main["productId"].tolist()))

#Create an empty data frame to store item affinity scores for items.
itemAffinity= pd.DataFrame(columns=('item1', 'item2', 'score'))
rowCount=0

#For each item in the list, compare with other items.
for ind1 in range(len(itemList)):

#Get list of users who bought this item 1.
item1Users = main[main.productId==itemList[ind1]]["userId"].tolist()
#print("Item 1 ", item1Users)

#Get item 2 - items that are not item 1 or those that are not analyzed already.
for ind2 in range(ind1, len(itemList)):

if ( ind1 == ind2):
continue

#Get list of users who bought item 2
item2Users=main[main.productId==itemList[ind2]]["userId"].tolist()
#print("Item 2",item2Users)

#Find score. Find the common list of users and divide it by the total users.
commonUsers= len(set(item1Users).intersection(set(item2Users)))
score=commonUsers / userCount

#Add a score for item 1, item 2
itemAffinity.loc[rowCount] = [itemList[ind1],itemList[ind2],score]
rowCount +=1
#Add a score for item2, item 1. The same score would apply irrespective of the sequence.
itemAffinity.loc[rowCount] = [itemList[ind2],itemList[ind1],score]
rowCount +=1

#Check final result
itemAffinity

代码在示例数据集上运行得非常好,但是
该代码在包含 100,000 行的数据集中运行的时间太长。请帮我优化代码。

最佳答案

是的,算法可以改进。您正在多次重新计算内部循环中项目的用户列表。 您可以在循环之外获取项目及其用户的字典。

# get unique items
items = set(main.productId)

n_users = len(set(main.userId))

# make a dictionary of item and users who bought that item
item_users = main.groupby('productId')['userId'].apply(set).to_dict()

# iterate over combinations of item1 and item2 and store scores
result = []
for item1, item2 in itertools.combinations(items, 2):

score = len(item_users[item1] & item_users[item2]) / n_users
item_tuples = [(item1, item2), (item2, item1)]
result.append((item1, item2, score))
result.append((item2, item1, score)) # store score for reverse order as well

# convert results to a dataframe
result = pd.DataFrame(result, columns=["item1", "item2", "score"])

时间差异:

问题的原始实现

# 3 个循环,3 个循环中最好的:每个循环 41.8 毫秒

马克方法2

# 3 个循环,3 个循环中最好的:每个循环 19.9 毫秒

此答案中的实现

# 3 个循环,3 个循环中最好的:每个循环 3.01 毫秒

关于python - 如何匹配python中运行时间太长的所有键值对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57162718/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com